Download full feature list
Ensembl version is updated to Ensembl release 96 (April 2019).
Rsubread – a Bioconductor software package that provides high-performance alignment and read
counting functions for RNA-seq reads (NAR February 2019). Rsubread integrates read mapping and
quantification in a single package.
Subread is a general-purpose read aligner which can align both genomic DNA-seq and RNA-seq
reads, based on its unique seed-and-vote design, by which a large number of 16mer subreads
from each read are mapped to the reference genome. The subread function accepts raw reads
in the form of Fastq, SAM or BAM files and output read alignments in either SAM or BAM
format. The output contains the total number of reads, the number of uniquely mapped reads,
the number of multi-mapping reads and other mapping statistics. The align function is
exceptionally flexible. It performs local read alignment and reports the largest mappable region
for each read.
The align function automatically detects insertions and deletions (indels). First step of indels
identification is mapping 16mer subreads from each read to the genome and determination of
the major mapping location of the read. The second step undertakes a detailed local realignment of each read with the aid of collected indels. The align function also writes VCF files
containing detected indels.
The align function can align read pairs arbitrarily far apart if the alignment is sufficiently good
and no more canonical alignment is available. A weighting strategy is used to give preference
to alignments within the expected fragment length bounds. Gene fusions are now supported
by allowing different subreads from the same read to map to different chromosomes.
Subjunc is an RNA-seq read aligner that provides comprehensive detection of exon–exon junctions and reports full alignments of junction-spanning reads in BED file format.
First part is mapping a large number of 16mer subreads from each read to the genome. This
step detects exon–exon junctions and determines the major mapping location of the read. The
second part undertakes a detailed local re-alignment of each read with the aid of collected
Subjunc function can detect exon–exon junctions de novo and to quantify expression at the
level of either genes, exons or exon junctions.
The featureCounts function counts the number of reads or read-pairs that overlap any specified set
of genomic features. It can assign reads to any type of genomic region. Regions may be specified as
simple genomic intervals (promoter regions) or can be collections of genomic intervals (genes
comprising multiple exons). Any set of genomic features can be specified in GTF, GFF or SAF file
format. SAF is a Simplified Annotation Format with columns GeneID, Chr, Start, End and Strand.
FeatureCounts produces a matrix of gene-wise counts and can be used as input for gene
expression analysis with limma, edgeR or DESeq2. Alternatively, a matrix of exon-level counts
can be produced suitable for differential exon usage analyses using limma, edgeR or DEXSeq.
FeatureCounts outputs the genomic length and position of each feature as well as the read
count, making it straightforward to calculate summary measures such as RPKM (reads per
kilobase per million reads).
The exactSNP function calls SNPs for individual samples, without requiring control samples to
be provided. It tests the statistical significance of SNPs by comparing SNP signals to their
The Limma-voom function performs differential expression analysis for pre-processed RNA-seq
data (single channel experiments) with sample-specific quality weights when the library sizes
are quite variable between samples or the presence of outlier samples is given. The output
reports the top100 differentially expressed genes and a pdf document containing density plots
from raw and filtered counts, plot about the Mean−variance trend and gives visual information
about sample clustering.
Guided linear model analysis
GeneXplain’s in-house implementation of a linear model analysis using Limma with experimental
design specified through an annotation table. This tool performs linear model analysis on the given
input table guided by selected experimental factors defined in a sample table. The analysis aims at
finding significant differences between pairs of levels of a main factor. Furthermore, an ANOVA is
carried out for all contrasts together. The assignment of main factor levels to columns of the input
table is specified in a column of a sample table. Additional variables can be controlled by providing
their column names in the sample table. Moreover, Surrogate Variable Analysis can be included to
infer unspecified factors.
HISAT is a very fast and sensitive alignment tool for mapping next-generation sequencing reads (DNA and RNA) to a population of human genomes (as well as to a single reference genome). HISAT2 uses
a large set of small GFM indexes that collectively cover the whole genome (each index representing
a genomic region of 56 Kbp, with 55,000 indexes needed to cover the human population). These
small indexes (called local indexes), combined with several alignment strategies, enable rapid and
accurate alignment of sequencing reads. This new indexing scheme is called a Hierarchical Graph
FM index (HGFM).
HISAT provides several alignment strategies specifically designed for mapping different types of
RNA-seq reads. All these together, HISAT enables extremely fast and sensitive alignment of reads,
in particular those spanning two exons or more. As a result, HISAT is much faster (over 50 times)
than TopHat2 with better alignment quality. HISAT uses the Bowtie2 implementation to handle
most of the operations on the FM index. In addition to spliced alignment, HISAT handles reads
involving indels and supports a paired-end alignment mode. HISAT outputs alignments in SAM
This tool takes an alignment file in SAM or BAM format and feature file in GFF format and calculates
the number of reads mapping to each feature. It uses the htseq-count script that is part of the HTSeq
A feature is an interval (i.e., a range of positions) on a chromosome or a union of such intervals. In
the case of RNA-Seq, the features are typically genes, where each gene is considered here as the
union of all its exons. One may also consider each exon as a feature, e.g., in order to check for
alternative splicing. For comparative ChIP-Seq, the features might be binding regions from a predetermined list.
This tool takes a file with high-throughput sequencing reads (either raw or aligned reads) and
performs a simple quality assessment by producing plots showing the distribution of called bases
and base-call quality scores by position within the reads. Output is a PDF file with all quality plots.
Empirical Analysis of Digital Gene Expression Data in R (EdgeR)
Differential expression analysis of RNA-seq expression profiles with biological replication.
Implements a range of statistical methodology based on the negative binomial distributions,
including empirical Bayes estimation, exact tests, generalized linear models and quasi-likelihood
tests. As well as RNA-seq, it can be applied to differential signal analysis of other types of genomic
data that produce counts, including ChIP-seq, Bisulfite-seq, SAGE and CAGE.
The tool can be applied to any technology that produces read counts for genomic features. Of
interest are summaries of short reads from massively parallel sequencing technologies such as
Illumina™, 454 or ABI SOLiD applied to RNA-Seq, SAGE-Seq or ChIP-Seq experiments, pooled shRNAseq or CRISPR-Cas9 genetic screens and bisulfite sequencing for DNA methylation studies. EdgeR
provides statistical routines for assessing differential expression in RNA-Seq experiments or
differential marking in ChIP-Seq experiments.
EdgeR can be applied to differential expression at the gene, exon, transcript or tag level. In fact,
read counts can be summarized by any genomic feature. EdgeR analyses at the exon level are easily
extended to detect differential splicing or isoform-specific differential expression.
This tool uses the edgeR quasi-likelihood pipeline (edgeR-quasi) for differential expression analysis.
This statistical methodology uses negative binomial generalized linear models, but with F-tests
instead of likelihood ratio tests. This method provides stricter error rate control than other negative
binomial based pipelines, including the traditional edgeR pipelines or DESeq2. While the limma
pipelines are recommended for large-scale datasets, because of their speed and flexibility, the
edgeR-quasi pipeline gives better performance in low-count situations.
miRNA feed forward loops
The miRNA feed forward loop is a combination of miRNA, a transcription factor and a target gene.
The miRNA regulates expression of the target gene. The transcription factor regulates transcription
of both: target gene and gene encoding the miRNA. This analysis accepts a list of miRNAs or a list of
target genes or both and finds corresponding miRNA feed forward loops. Results can be visualized
using ‘create miRNA diagram’ function.
Analyze miRNA target enrichment
This tool takes a set of human genes as input and searches for miRNA target enrichment with the
help of our in-house TargetScan database. The resource comprises miRNA/target gene links for
human based on conserved miRNA site prediction from TargetScan.
Get miRNA targets
This tool provides from a given list of human miRNAs their potential miRNA target genes with the
help of our in-house TargetScan database.
Create miRNA promoters
This tool creates a track (promoter sequence collection) from a list of given miRNAs.
CR cluster selector
This method uses the result of a CRC (Chinese Restaurant Clustering) analysis and extracts most
centered cluster items into a separate table. More detailed, it sorts clusters by size, takes top
Maximum number of clusters to use with size greater than Min items per cluster and extracts Max
items per cluster items closest to the center of cluster.
This method checks project disk space usage and corrects it with deleting temporary used disc space
Filter duplicate rows
This method filters duplicate rows from the input table and gives a non-redundant output.
Group table rows
This method helps to group selected rows from the input table. Output will contain a table with the
number of counts of the selected rows.
Merge Table columns
This method merges several columns of a table into one column using a selected aggregator.
Plot pie chart
This method creates a pie chart using a category column and a leading column for a visual evaluation
and comparison of the results. Colors for the different segments of the pie chart can be customized
using the color palate.
Plot bar chart
This method creates a bar chart using a category column and leading column for a visual evaluation
and comparison of the results. Colors for the different segments of the plot bar chart can be
customized using the color palate.
Select random rows
This method selects random rows from the input table to create a randomized output table based
on the selected random number of rows and percentage.
Select top rows
This method selects top rows based on the selected parameter from the input table. The rows can
be selected as top, middle or bottom types from the input table.
Super annotate table
This method allows to annotate columns in the input table from another user defined input table(s).
It can use more than one table as the source table to annotate different columns in the input table.
This method changes/transforms the input table based on the selected operation. It can add the
selected operation (log2, log 10, Pow2, Pow10, exp) to the input table.
Tree map on functional classification
This analysis (REVIGO) was adapted and allows a tree map visualization of functional classification
results to reduce Gene Ontology terms and based on the p-value cut-off.
This method compares the structures of two diagrams based on the input parameters and gives
different entries within the resulting diagram.
Get molecules from diagrams
This analysis creates clones of selected nodes (protein names) for each its edge. It can be used to focus on
main interesting regulatory proteins and visualize results in the output clone diagram.
Find regulatory regions
This method creates promoter regions based on an input gene list (and based on Chip-seq peaks
located near the TSS if peaks are present). On top information from CAGE database (FANTOM5)
about most active TSSs in specific tissues or cell type can be used and promoter length to be
extracted can be defined. The result of this analysis can be used as input track for Site search
Split VCF by regulation
This analysis separates an input VCF file into two sets: Yes-set – sites that do overlap with the
regulation track (= Out yes), No-set – sites, that do not. If the No-set contains less sites than defined
in the Min no size parameter (default = 100), random tracks from Default VCF will be added to the
No-set (=Out no).
Motif Score Association Test (MSAT) is a tool to test for association between a motif score and a
quantity of interest using generalized linear models (glms).
For each TFBS motif of a specified profile, scores are calculated for promoter sequences of input
genes. R’s glm methodology is then applied to model the quantity of interest given with the input
genes as dependent variable or response (y) and motif scores as independent variable or predictors
For each motif the tool reports the estimated coefficient (slope) as well as its unadjusted and
adjusted p-values, where it is assumed that the results are most suitably ranked by (unadjusted or
Construct composite module on tracks with keynodes
Predicts composite module as Construct composite modules on tracks method AND uses Regulator
search analysis during calculating score of the composite module (score is used to select the best
module). Regulator search finds master-regulators which affect transcription factors included in the
Construct composite modules with keynodes
Predicts composite module as Construct composite modules method AND uses Regulator search
analysis during calculating score of the composite module (score is used to select the best module).
Regulator search finds master-regulators which affect transcription factors included in the modules.
- New start page icons in any of the platform’s research categories
- Pre-release of TRANSFAC®, TRANSPATH® and HumanPSD databases (version 2018.2)
- Update to Ensembl 91 database
- Update to Reactome 63 database
Construct composite modules on track (correlation) – method predicts composite module using the result of the “Site search on gene set” analysis.
Cluster track – method clusters sites in a track, what is useful for merging of closely spaced sites into one big cluster.
Compute profile thresholds – method computes profile thresholds minimizing either false negatives(minFN) or false positive(minFP) on the random DNA sequence.
Create miRNA promoters – method extracts miRNA promoters from mirprom database for a given list of miRNAs
Get transcripts track – method extracts track from a database by a transcript ID
Recalculate composite module score on new track – method takes best composite model from the given CMA result and calculates its scores on all sites of a given track.
Continue CMA – method continues prediction of composite module using results of the previous prediction as a start point. Prediction parameters are customizable.
Table Imputation – method replaces missing data in the given input table with row means.
- New HTML report for site search analysis
You can now create a summary of your site search analysis including visualization of input promoters together with identified enriched transcription factor binding sites (TFBSs) in HTML format, which can be exported to your local computer.
- Integration with updated TRANSFAC®, TRANSPATH® and HumanPSDTM databases in release 2018.1
Enhancement of the method LRPath.
Installation of TRANSFAC 2017.3 (information download)
– Annotation of transcription factor binding sites based on sequence conservation
– ChIP-Seq experiment browse pages
– Reorganization of the in vivo transcription factor bound fragment section on a Locus Report
– HOCOMOCO v10 matrix library integration
– Enhanced human SNP content
– Ensembl version update
Installation of TRANSPATH & HumanPSD 2017.3 (information download)
– Integration of new clinical trial (CT) data sources
– Improved user data management
– Quick search for disease and drug entries
– Link-out to BRENDA professional – the comprehensive enzyme information system
– New phosphorylation targets content
LRPath is a Gene Set Enrichment Analysis (GSEA) method that uses logistic regression models to discover categories that are significantly correlated with a predictor.
New protein category (TRANSPATH® isogroups) to enhance identification of master regulators. This enhancement was updated to all workflows which included ‘Regulator search’ or ‘Effector search’ method.
Installation of TRANSFAC public:
– Available for everyone
– 219 profiles (matrices) for site search tools
– Search function implemented
– Bug fixed that prevented analysis from completing correctly
– Added option to run DESeq or DESeq2
– New versions of PROTEOMETM data now named HumanPSDTM database
– Latest release 2017.2 available in the geneXpain platform
– Platform Java API available from github.com/genexplain/genexplain-api
– Executable jar can be configured with JSON config files to invoke platform processes from the command line
Blazquez R., Wlochowitz D., Wolff A., Seitz S., Wachter A., Perera-Bel J., Bleckmann A., Beißbarth T., Salinas G., Riemenschneider M.J., Proescholdt M., Evert M., Utpatel K., Siam L., Schatlo B., Balkenhol M., Stadelmann C., Schildhaus H.U., Korf U., Reinz E., Wiemann S., Vollmer E., Schulz M., Ritter U., Hanisch UK., Pukrop T. (2018) PI3K: A master regulator of brain metastasis-promoting macrophages/microglia. Glia. 66(11):2438-2455. doi: 10.1002/glia.23485. Epub 2018 Oct 25. Link
Orekhov, A. N., Oishi, Y., Nikiforov, N. G., Zhelankin, A. V., Dubrovsky, L, Sobenin, I. A., Kel, A., Stelmashenko, D., Makeev, V. J., Foxx, K., Jin, X., Kruth, H. S. and Bukrinsky, M. (2018) Modified Ldl Particles Activate Inflammatory Pathways In Monocyte-Derived Macrophages: Transcriptome Analysis. Curr. Pharm. Des., 11
. doi: 10.2174/1381612824666180911120039. Link
Kalozoumi, G., Kel-Margoulis, O., Vafiadaki, E., Greenberg, D., Bernard, H., Soreq, H., Depaulis, A., Sanoudou, D. (2018) Glial responses during epileptogenesis in Mus musculus point to potential therapeutic targets. PLoS One. 13
(8):e0201742. doi: 10.1371/journal.pone.0201742
Smetanina, M.A., Kel, A.E., Sevost’ianova, K.S., Maiborodin, I.V., Shevela, A.I., Zolotukhin, I.A., Stegmaier, P., Filipenko, M.L. (2018) DNA methylation and gene expression profiling reveal MFAP5 as a regulatory driver of extracellular matrix remodeling in varicose vein disease. Epigenomics. 10(8):1103-1119. doi: 10.2217/epi-2018-0001. Link
Boyarskikh, U., Pintus, S., Mandrik, N., Stelmashenko, D., Kiselev, I., Evshin, I., Sharipov, R., Stegmaier, P., Kolpakov, F., Filipenko, M., Kel, A. (2018) Computational master-regulator search reveals mTOR and PI3K pathways responsible for low sensitivity of NCI-H292 and A427 lung cancer cell lines to cytotoxic action of p53 activator Nutlin-3. BMC Med Genomics. 11(Suppl 1):12. doi: 10.1186/s12920-018-0330-5. Link
Triska, M., Solovyev, V., Baranova, A., Kel, A., Tatarinova, T.V. (2017) Nucleotide patterns aiding in prediction of eukaryotic promoters. PLoS One. 12(11):e0187243. doi: 10.1371/journal.pone.0187243. Link
Niehof, M., Hildebrandt, T., Danov, O., Arndt, K., Koschmann, J., Dahlmann, F., Hansen, T. and Sewald, K. (2017) RNA isolation from precision-cut lung slices (PCLS) from different species. BMC Res. Notes 10
, 121. doi: 10.1186/s13104-017-2447-6. Link
Mandić, A. D., Bennek, E., Verdier, J., Zhang, K., Roubrocks, S., Davis, R. J., Denecke, B., Gassler, N., Streetz, K., Kel, A., Hornef, M., Cubero, F. J., Trautwein, C. and Sellge, G. (2017) c-Jun N-terminal kinase 2 promotes enterocyte survival and goblet cell differentiation in the inflamed intestine. Mucosal Immunol. 10(5):1211-1223. doi: 10.1038/mi.2016.125
Pietrzyńska, M., Zembrzuska, J., Tomczak, R., Mikołajczyk, J., Rusińska-Roszak, D., Voelkel, A., Buchwald, T., Jampílek, J., Lukáč, M., Devínsky, F. (2016) Experimental and in silico investigations of organic phosphates and phosphonates sorption on polymer-ceramic monolithic materials and hydroxyapatite. Eur. J. Pharm. Sci. 93
, 295-303. doi: 10.1016/j.ejps.2016.08.033. Link
Kural, K. C., Tandon, N., Skoblov, M., Kel-Margoulis, O. V. and Baranova, A. V. (2016) Pathways of aging: comparative analysis of gene signatures in replicative senescence and stress induced premature senescence. BMC Genomics 17(Suppl 14)
, 1030. doi: 10.1186/s12864-016-3352-4. Link
Kel, A. E., Stegmaier, P., Valeev, T., Koschmann, J., Poroikov, V., Kel-Margoulis, O. V. and Wingender, E. (2016) Multi-omics “upstream analysis” of regulatory genomic regions helps identifying targets against methotrexate resistance of colon cancer. EuPA Open Proteomics 13
, 1-13. doi: 10.1016/j.euprot.2016.09.002. Link
Ciribilli, Y., Singh, P., Inga, A., Borlak, J. (2016) c-Myc targeted regulators of cell metabolism in a transgenic mouse model of papillary lung adenocarcinoma. Oncotarget 7
, 65514-65539. doi: 10.18632/oncotarget.11804. Link
Wlochowitz, D., Haubrock, M., Arackal, J., Bleckmann, A., Wolff, A., Beißbarth, T., Wingender, E., Gültas, M. (2016) Computational Identification of Key Regulators in Two Different Colorectal Cancer Cell Lines. Front. Genet. 7
, 42. doi: 10.3389/fgene.2016.00042. Link
Lee, E.H., Oh, J.H., Selvaraj, S., Park, S.M., Choi, M.S., Spanel, R., Yoon, S. and Borlak, J. (2016) Immunogenomics reveal molecular circuits of diclofenac induced liver injury in mice. Oncotarget 7
, 14983-15017. doi:10.18632/oncotarget.7698. Link
Borlak, J., Singh, P. and Gazzana, G. (2015) Proteome mapping of epidermal growth factor induced hepatocellular carcinomas identifies novel cell metabolism targets and mitogen activated protein kinase signalling events. BMC Genomics 16
, 124. doi:10.1186/s12864-015-1312-z. Link
Koschmann, J., Bhar, A., Stegmaier,P., Kel, A. E. and Wingender, E. (2015) “Upstream Analysis”: An integrated promoter-pathway analysis approach to causal interpretation of microarray data. Microarrays 4
, 270-286. doi:10.3390/microarrays4020270. Link
Shi, Y., Nikulenkov, F., Zawacka-Pankau, J., Li, H., Gabdoulline, R., Xu, J., Eriksson, S., Hedström, E., Issaeva, N., Kel, A., Arnér, E.S., Selivanova, G. (2014) ROS-dependent activation of JNK converts p53 into an efficient inhibitor of oncogenes leading to robust apoptosis. Cell Death Differ. 21
, 612-623. doi:10.1038/cdd.2013.186 Link
Schlereth, K., Heyl, C., Krampitz, A.M., Mernberger, M., Finkernagel, F., Scharfe, M., Jarek, M., Leich, E., Rosenwald, A., Stiewe, T. (2013) Characterization of the p53 Cistrome – DNA Binding Cooperativity Dissects p53’s Tumor Suppressor Functions. PLoS Genet. 9
, e1003726. PubMed
Nikulenkov, F., Spinnler, C., Li, H., Tonelli, C., Shi, Y., Turunen, M., Kivioja, T., Ignatiev, I., Kel, A., Taipale, J., Selivanova, G. (2012) Insights into p53 transcriptional function via genome-wide chromatin occupancy and gene expression analysis. Cell Death Differ. 19
, 1992-2002. PubMed
Zawacka-Pankau, J., Grinkevich, V.V., Hunten, S., Nikulenkov, F., Gluch, A., Li, H., Enge, M., Kel, A., Selivanova, G. (2011) Inhibition of glycolytic enzymes mediated by pharmacologically activated p53: targeting Warburg effect to fight cancer. J. Biol. Chem. 286
, 41600-41615. PubMed
Kel, A., Kolpakov, F., Poroikov, V., Selivanova, G. (2011) GeneXplain — Identification of Causal Biomarkers and Drug Targets in Personalized Cancer Pathways. J. Biomol. Tech. 22(suppl.)
, S16. PubMed