geneXplain platform news
- HumanPSD™ is updated to version 2021.2 (September 2021).
- TRANSFAC® is updated to version 2021.2 (September 2021).
- TRANSPATH® is updated to version 2021.2 (September 2021).
- Transfac ‘Matrix to Ensembl’ matching was extended by links from matrices to orthologous and paralogous Ensembl identifiers. These links are used in hub, which converts Matrices to Ensembl genes in Convert table analysis.
- New proteomics example with Gene Expression Omnibus data (GSE66789): Study of Myc induced regulation of protein translation. Find it at Data –> Examples –> Proteome profiling upon Myc activation, GSE66789.
- Extended descriptions were added for the following examples:
· COVID-19 suppress innate immune responses GSE156063, Illumina high throughput sequencing
· Case study for RNA-seq data analysis
· Chronic Myeloid Leukemia Patient Genotyping
· Cytokine-triggered gene expression in cell cycle stages, GSE52465, Agilent-014850 microarray
· E2F1 binding regions in HeLa cells, ChIP-Seq
· HCV infection in liver GSE31193, Affymetrix U133 Plus 2.0 microarray
· Proteome profiling upon Myc activation, GSE66789
· TNF-stimulation of HUVECs GSE2639, Affymetrix HG-U133A microarray
· miRNA regulation by Myocardin and ERalpha, GSE44139, Affymetrix Multispecies miRNA-2
- HumanPSD™ is updated to version 2021.1 (January 2021).
- TRANSFAC® is updated to version 2021.1 (January 2021).
- TRANSPATH® is updated to version 2021.1 (January 2021).
- Identify enriched motifs in cell specific promoters (TRANSFAC(R))
This workflow searches for enriched transcription factor binding sites (TFBSs) in a set of gene promoters versus a random promoter set. The input gene set is used to extract promoter regions by mapping it against the TSS locations defined in the Fantom5 (Nature 507:462–470) database for one selected cell-type among 172 available cell-types. The over-represented sites identified with the MEALR method are converted into a profile, which is used for a second round of site search analysis and ends up with the identification of potential transcription factors.
- MEALR classifier (tracks)
MEALR searches for a combination of transcription factor binding motifs that discriminate between a positive (Yes) and a negative (No) sequence set. This tool takes a sparse logistic regression model derived with MEALR and applies it to new sequences to predict whether they can be bound by TF complexes or contribute to gene regulation in the same way as the Yes sequences used to train the MEALR model.
- Random forest prediction
Random forests are a combination of tree predictors and that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. This statistical method performs a classification or regression with a random forest model, based on Breiman (published in Machine Learning). Please refer to documentation of the R randomForest package for computational details. Random forests can be trained using the Train random forest tool (see below).
- Train random forest
This method can be used to train a random forest model for classification, regression, or clustering. Please refer to documentation of the R randomForest package for computational details. Except for unsupervised models, the trained random forests can be used for further classification or regression analysis with the Random forest prediction tool.
The new tool contains a R wrapper around the fast T-distributed Stochastic Neighbor Embedding implementation by Van der Maaten (more information on the original implementation is here). The tool can be used for data visualization using the t-SNE algorithm.
- Fantom5 workflows now available for both, hg19 and hg38 genome versions.
- The workflow for analyzing a SNP list with TRANSFAC database is now available for the hg38 genome version.
- New example with Gene Expression Omnibus data (GSE156063): Upper airway gene expression differentiates COVID-19 from other acute respiratory illnesses and reveals suppression of innate immune responses by SARS-CoV-2, Expression profiling by high throughput sequencing, Illumina NovaSeq 6000 Homo sapiens.
- TRANSFAC®, TRANSPATH® and HumanPSD™ databases update to release 2020.3
- Ensembl version update to release 100
- Reactome database is updated to version 74
Compute differentially expressed genes using Limma and Metadata
This workflow performs a linear model analysis to identify differentially expressed genes from multiple samples using Limma statistics and a metadata table for the samples. The given input table contains expression values from several samples and a corresponding sample table (metadata) for guiding the limma analysis by selected experimental factors. The workflow aims at finding significant differences between pairs of levels of a main factor (Treatment). Furthermore, an ANOVA is carried out for all contrasts together. The primary result of the linear model analysis is further filtered to identify significant up- and down-regulated genes for each sample comparison.
Join full tables
Joining two tables into a new one with containing the selected columns. Different joining types can be processed according to the ID matching from both input tables. Following joining types are available: Inner join, outer join, left join, right join, left subtraction, right subtraction, and symmetric difference.
Calculate keynodes ranks
This method adds new score-specific ranks for each identified master regulator molecule to identify the best corresponding master regulator from the input list.
Select keynodes with top targets
This method selects the top master regulators (keynodes) by score and linkage to top targets, which can be potential drugs from HumanPSD database.
- Update MTB report to version 2.0.0
- Removal of redundant Ensembl versions for the same build
- Add Ensembl annotation source to all workflows
- Bug fixing of Affymetrix miRNA chips normalization
- Easy selection of current data project
- TRANSFAC®, TRANSPATH® and HumanPSD™ databases update to release 2020.2
- Ensembl version update to release 99
- Gene Ontology update to version 2020-03-25
- New Import – import now supports the Drag and Drop function.
- EdgeR for two tables – method now supports two separate read counts tables for samples and control samples.
- MPT report – search within curated databases (GKDB1 and CIViC2) for predictive biomarkers according to their clinical evidence for somatic variants (mutations, amplifications, deletions, rearrangements) of a patient and outputs automatic pdf report.
- Calculate CMA regulation – calculates regulatory scores for transcription factors from a CMA (composite modules analysis) and Master regulator (MR) analysis result and creates a visualization of top MRs with CMA modules, underlying genes and potential feed forward loop regulation.
- Create profile from CMA model – generates a matrix collection of transcription factors found in a user’s CMA model.
- Find regulatory regions with mutations – calculates mutation scores using information about mutation locations.
- PSD pharmaceutical compounds analysis – generates a table with drugs known to be acting against the corresponding targets of an input gene list.
Workflows allow a fully automatic analysis from raw RNAseq reads with pre-processing, alignment summaries, quality control, plots and estimation of differentially expressed genes (DEGs) for unlimited FASTQ files stored simply in a data folder:
- Full RNAseq analysis with HISAT2, featureCounts and limma
- Full RNAseq analysis with HISAT2, htseq-counts and limma
- Full RNAseq analysis with subread, featureCounts and limma
- Transparent genome version control – easy selection of a genome version in RNAseq pre-processing pipelines, which will be applied to state-of-the-art alignments methods.
For more details please explore the full new features list of geneXplain® platform release 6.0.
- New start page icons in any of the platform’s research categories
- Pre-release of TRANSFAC®, TRANSPATH® and HumanPSD databases (version 2018.2)
- Update to Ensembl 91 database
- Update to Reactome 63 database
The geneXplain platform toolbox for bioinformatics data analysis contains these new functional features in the current release:
- New analysis methods
Construct composite modules on track (correlation) – method predicts composite module using the result of the “Site search on gene set” analysis.
Cluster track – method clusters sites in a track, what is useful for merging of closely spaced sites into one big cluster.
Compute profile thresholds – method computes profile thresholds minimizing either false negatives(minFN) or false positive(minFP) on the random DNA sequence.
Create miRNA promoters – method extracts miRNA promoters from mirprom database for a given list of miRNAs
Get transcripts track – method extracts track from a database by a transcript ID
Recalculate composite module score on new track – method takes best composite model from the given CMA result and calculates its scores on all sites of a given track.
Continue CMA – method continues prediction of composite module using results of the previous prediction as a start point. Prediction parameters are customizable.
Table Imputation – method replaces missing data in the given input table with row means.
- New HTML report for site search analysis
You can now create a summary of your site search analysis including visualization of input promoters together with identified enriched transcription factor binding sites (TFBSs) in HTML format, which can be exported to your local computer. These results can be easily used for presentations or publications.
- New toolbar buttons
Check out our new toolbar icons which will lead you to remarkable results in your research simply by a couple of clicks.
- Integration with updated TRANSFAC®, TRANSPATH® and HumanPSDTM databases in release 2018.1
The TRANSFAC® database of transcription factors, their genomic binding sites and DNA-binding motifs (PWMs), TRANSPATH® database of mammalian signal transduction and metabolic pathways and Human Proteome Survey Database (HumanPSDTM) with focus on human proteins as disease biomarkers and drug targets in their 2018.1 release versions are currently integrated with the geneXplain platform.
Installation of TRANSFAC 2017.3 (information download)
– Annotation of transcription factor binding sites based on sequence conservation
– ChIP-Seq experiment browse pages
– Reorganization of the in vivo transcription factor bound fragment section on a Locus Report
– HOCOMOCO v10 matrix library integration
– Enhanced human SNP content
– Ensembl version update
Installation of TRANSPATH & HumanPSD 2017.3 (information download)
– Integration of new clinical trial (CT) data sources
– Improved user data management
– Quick search for disease and drug entries
– Link-out to BRENDA professional – the comprehensive enzyme information system
– New phosphorylation targets content
New method: LRPath is a Gene Set Enrichment Analysis (GSEA) method that uses logistic regression models to discover categories that are significantly correlated with a predictor.
New protein category (TRANSPATH® isogroups) to enhance identification of master regulators.
Installation of TRANSFAC public:
– Available for everyone
– 219 profiles (matrices) for site search tools
– Search function implemented
– Bug fixed that prevented analysis from completing correctly
– Added option to run DESeq or DESeq2
– New versions of PROTEOMETM data now named HumanPSDTM database
– Latest release 2017.2 available in the geneXpain platform
– Platform Java API available from github.com/genexplain/genexplain-api
– Executable jar can be configured with JSON config files to invoke platform processes from the command line