What is Site Analysis?

Binding sites for proteins in the genome have a great regulatory impact on the gene activities in their neighborhood. Since these interactions are highly dynamic with regard to the cell’s status, we have experimental knowledge about the actual occupancy of these sites only for a very small percentage of them. Predictive tools are thus essential for deciphering the full regulatory potential of gene control regions like promoters, enhancers, etc.

Approaches to site analysis

Among the most popular methods to identify potential transcription factor binding sites (TFBSs) is the use of position-specific scoring or positional weight matrices (PSSM or PWM). The TRANSFAC® database, the gold standard in the field, harbors the largest collection of PWMs. They are used to predict TFBSs either by MATCH Suite, which is part of the TRANSFAC®2.0 online resource, or by a number of programs that are included in the geneXplain platform.

Site analysis in the geneXplain platform

The sequence patterns individual TRANSFAC matrices represent and recognize are visualized as logo plots.

transfac_gXp-4_small

these matrices can be individually selected and combined to “profiles”; a number of pre-defined profiles are available for subsequent sequence analysis. Matrix matches are visualized along the gene sequences in a customizable manner.

Visualization of transcription factor binding sites (TFBSs) with geneXplain platform’s genome browser. Click image for an enlarged view.

The built-in genome browser enables to comfortably zoom-out to chromosomal level, or to zoom-in to the nucleotide level. Individual sites are clickable to invoke detailed information.

transfac_gXp-3_small

Advanced promoter analysis

State-of-the-art analysis of regulatory regions has to exceed recognition of single sites. Functional promoters, and presumably enhancers and other regulatory regions, are characterized by specific arrays of individual sites. As variable as their compositions may be, the syntax of sites in each regulatory region has to follow defined rules, which are largely unknown yet.

Therefore, the geneXplain platform provides an empirical way to identify the specific combination of sites that characterizes a given set of co-regulating promoters.

Promoter analysis for composite modules. Shown is the visualization of transcription factor binding sites (TFBSs) constituting a "promoter model", which has been computationaly generated with CMA (composite module analyst).

Complex promoter analysis, visualization of transcription factor binding sites (TFBSs) constituting a “promoter model”. (Click Image for an enlarged view.)

These specific combinations, also called “promoter models”, can be further used for screening genomic sequences or promoter databases. A comprehensive collection of mammalian promoters comes along with the TRANSFAC® database in its TRANSPRO section. The density of model matches is visualized by graded shading.

Promoter analysis for matches with a "promoter model", which comprises a set of transcription factor binding sites (TFBSs). The visualization highlights matches of those TFBSs that belong to a promoter model. The inetnsity of the background shading indicates the density of model matches.

Promoter analysis for matches with a model comprising a set of transcription factor binding sites (TFBSs). (Click Image for an enlarged view.)

Key publications for promoter analysis

Deyneko, I. V., Kel, A. E., Kel-Margoulis, O. V., Deineko, E. V., Wingender, E. and Weiss, S. (2013) MatrixCatch – a novel tool for the recognition of composite regulatory elements in promoters. BMC Bioinformatics 14, 241. Link

Stegmaier, P., Kel, A., Wingender, E., Borlak, J. (2013) A discriminative approach for unsupervised clustering of DNA sequence motifs. PLoS Comput. Biol. 9, e1002958. Link

Stegmaier P., Voss N., Meier T., Kel A., Wingender E., Borlak J. (2011) Advanced computational biology methods identify molecular switches for malignancy in an EGF mouse model of liver cancer. PLoS ONE 6, e17738. PubMed.

Wingender E. (2008) The TRANSFAC project as an example of framework technology that supports the analysis of genomic regulation. Brief Bioinform. 9:326-332. PubMed.

Kel A., Konovalova T., Waleev T., Cheremushkin E., Kel-Margoulis O., Wingender, E. (2006) Composite Module Analyst: a fitness-based tool for identification of transcription factor binding site combinations. Bioinformatics 22:1190-1197. PubMed.

Waleev T. Shtokalo D., Konovalova T., Voss N., Cheremushkin E., Stegmaier P., Kel-Margoulis O., Wingender E., Kel A. (2006) Composite Module Analyst: identification of transcription factor binding site combinations using genetic algorithm. Nucleic Acids Res. 34, W541-W545. PubMed.

Kel A.E., Gössling E., Reuter I., Cheremushkin E., Kel-Margoulis O.V., Wingender E. (2003) MATCH: A tool for searching transcription factor binding sites in DNA sequences. Nucleic Acids Res. 31:3576-3579. PubMed

Benefits

Get the most comprehensive overview about potential regulatory elements in your sequences.

Fast and immediate analysis with a small library of positional weight matrices.

Make use of the full potential of the gold standard in the field, the TRANSFAC® database.

Automatically generate complex promoter models.

Generate high-quality hypotheses about the transcription factors acting on a given set of co-regulated promoters.

Search for additional genes fitting to your promoter model.

×