Upstream Analysis

genexplain platform logo

What is Upstream Analysis?

GeneXplain’s proprietary approach to analyze gene expression data is called Upstream Analysis. The term indicates that it is a causal analysis, providing a clue about the reason why a certain set of genes has been up- (or down-) regulated in the system under study. In contrast, conventional analyses usually reveal the effects of the differentially expressed genes, e.g. by mapping them onto ontological categories.


How does it work?

GeneXplain’s Upstream Analysis is an integrated promoter – pathway analysis. It starts from any list of differentially expressed genes (DEGs), which you may have extracted from your raw data with the aid of the geneXplain platform, and comprises two main steps:
  • At first, the promoters of the differentially regulated genes are retrieved and analyzed for potential transcription factor (TF) binding sites and their combinations. From that, a set of TFs is identified that potentially have regulated the found DEGs.
  • In a second step, the pathways are reconstructed that are known to activate the previously hypothesized TFs. Molecules where these pathways converge are considered as potential master regulators of the process under study

Step 1: Promoter analysis

First, potential transcription factor binding sites (TFBSs) are identified in all promoters of the DEGs of your experiment (Yes set) as well as in a negative control set (No set). This is the usually done with a library of position-specific scoring or positional weight matrices (PSSMs or PWMs).
We recommend to apply the most comprehensive matrix library available, the TRANSFAC® database, and using the MATCHTM algorithm for the sequence analysis.
Next, out of all these potential transcription factor binding sites (TFBSs), those that are characteristic for the DEG set under study are identified. This is done by rigorously determining their enrichment in the Yes- compared to the No set.
Learn more about promoter analysis with TRANSFAC® in the geneXplain platform.

Step 2: Pathway analysis

Step 1 resulted in a set of transcription factors (TFs), that are likely repsonsible for the differential regulation of the observed set of DEGs. From available pathway data, we have extracted information about all relevant signaling cascades that regulate the activity of TFs; optimally, the TRANSPATH® database is used for this and the further analysis.
As has been proven in a large number of use cases, these pathways usually converge in a couple of key nodes, which qualify as candidate master regulators of the process under study.

Activities of transcription factors (TFs, blue circles) are regulated by upstream signaling cascades (components shown as green circles). These converge in certain nodes, representing molecules that are potential master regulators of the process under study.


Deyneko, I. V., Kel, A. E., Kel-Margoulis, O. V., Deineko, E. V., Wingender, E. and Weiss, S. (2013) MatrixCatch – a novel tool for the recognition of composite regulatory elements in promoters. BMC Bioinformatics 14, 241. Link

Stegmaier, P., Kel, A., Wingender, E., Borlak, J. (2013) A discriminative approach for unsupervised clustering of DNA sequence motifs. PLoS Comput. Biol. 9, e1002958. Link

Stegmaier P., Voss N., Meier T., Kel A., Wingender E., Borlak J. (2011) Advanced computational biology methods identify molecular switches for malignancy in an EGF mouse model of liver cancer. PLoS ONE 6, e17738. PubMed.

Wingender E. (2008) The TRANSFAC project as an example of framework technology that supports the analysis of genomic regulation. Brief Bioinform. 9:326-332. PubMed.

Kel A., Konovalova T., Waleev T., Cheremushkin E., Kel-Margoulis O., Wingender, E. (2006) Composite Module Analyst: a fitness-based tool for identification of transcription factor binding site combinations. Bioinformatics 22:1190-1197. PubMed.

Waleev T. Shtokalo D., Konovalova T., Voss N., Cheremushkin E., Stegmaier P., Kel-Margoulis O., Wingender E., Kel A. (2006) Composite Module Analyst: identification of transcription factor binding site combinations using genetic algorithm. Nucleic Acids Res. 34, W541-W545. PubMed.

Kel A.E., Gössling E., Reuter I., Cheremushkin E., Kel-Margoulis O.V., Wingender E. (2003) MATCH: A tool for searching transcription factor binding sites in DNA sequences. Nucleic Acids Res. 31:3576-3579. PubMed


Get the most comprehensive overview about potential regulatory elements in your sequences.
Fast and immediate analysis with a small library of positional weight matrices.
Make use of the full potential of the gold standard in the field, the TRANSFAC® database.
Automatically generate complex promoter models.
Generate high-quality hypotheses about the transcription factors acting on a given set of co-regulated promoters.
Search for additional genes fitting to your promoter model.

Use cases

In a recently published study, geneXplain’s approach to a causative upstream analysis was applied to toxicological datasets, revealing master regulators of the toxic effect of naphthalene onto liver and lung tissue. These regulators were clearly associated with both tumorigenic and apoptotic processes.

Promoter analysis for matches with a model comprising a set of transcription factor binding sites (TFBSs). (Click image for an enlarged view.)

How to apply the Upstream Analysis?

In the geneXplain platform, a number of workflows make use of the concept of Upstream Analysis.