Bioinformatics services

Our bioinformatics services in details

Below you will find a detailed description of some of our selected bioinformatics services. Please note that our expertise is not limited to the list of these tasks, they are listed here just as an example of what you can request from us.

DNA sequence analysis

We can analyze any DNA sequence by searching for transcription factor binding sites (TFBSs) within it. Sequences of any eucaryotic genomes can be processed when provided in EMBL, FASTA or Genbank formats. Matrix library of the TRANSFAC database will be used for performing the site search. We will perform promoter analysis and enriched motifs can be identified when comparison to (random/custom) background sets will be included.

Composite modules in promoters can be identified for selected species, including Human (Homo sapiens), Mouse (Mus musculus), Rat (Rattus norvegicus), Arabidopsis (Arabidopsis Thaliana), Nematoda (Caenorhabditis elegans), Zebrafish (Danio rerio), Fruit fly (Drosophila melanogaster), Baker’s yeast (Saccharomyces cerevisiae), Fission yeast (Schizosaccharomyces pombe).

The search for enriched transcription factor binding sites (TFBSs) in a set of genomic sequences will provide you with the resulting list of transcription factors that can bind to the identified TFBSs. MATCH or MEALR site search methods can be applied by us.

As an option, search for enriched TFBS can be performed by us using the tissue-specific TSS information coming from the Fantom5 database.

In case variant analysis will be performed, we can additionally identify which TFBSs are enriched around the variations, and which TFs are responsible for the regulation of the corresponding promoters. We can predict the mutation effect on TF binding sites and identify acting mutations that do change the gene regulation process in the studied condition (see details below on this page).

Genome browser visualization of the identified transcription factor binding sites will be provided.

ChIP-seq data analysis

Peak calling

We can perform the peak calling using MACS or SICER tools:

MACS is a tool to identify peaks, regions likely bound by targeted protein, in ChIP-seq data. It empirically models the length of the sequenced ChIP fragments, which tends to be shorter than sonication or library construction size estimates, and uses it to improve the spatial resolution of predicted binding sites.

SICER delineates significantly ChIP-enriched regions, which can be used to associate with other genomic landmarks and identifies the reads on the ChIP-enriched regions, which can be used for profiling and other quantitative analysis.

Identification of target genes from ChIP-seq peaks

Within this service we will identify genes located near the ChIP-seq peaks or near other genomic intervals. The output will contain a table of target genes overlapping with the fragments of the input track. The obtained gene list can be further analyzed for functional classification, or cluster analysis can be performed (see below).

This service is available for the following species: Human (Homo sapiens), Mouse (Mus musculus), Rat (Rattus norvegicus), Arabidopsis (Arabidopsis thaliana), Nematoda (Caenorhabditis elegans), Zebrafish (Danio rerio), Fruit fly (Drosophila melanogaster), Human (Homo sapiens), Baker’s yeast (Saccharomyces cerevisiae), Fission yeast (Schizosaccharomyces pombe).

Identification of transcription factor binding sites on ChIP-Seq peaks

We will map putative TFBSs on peaks calculated from your ChIP-seq data using the TRANSFAC® library of positional weight matrices.

We can also search for composite modules – pairs of TFBSs that discriminate between two tracks, the Yes and the No track. As the Yes track, ChIP-seq peaks or intervals identified in analyses for histone modifications, or any other genomic fragments, can be considered. No sequences are often other non-coding genomic regions not overlapping with the peaks.

This service is available for the following species: Human (Homo sapiens), Mouse (Mus musculus), Rat (Rattus norvegicus), Arabidopsis (Arabidopsis Thaliana), Nematoda (Caenorhabditis elegans), Zebrafish (Danio rerio), Fruit fly (Drosophila melanogaster), Baker’s yeast (Saccharomyces cerevisiae), Fission yeast (Schizosaccharomyces pombe).

RNA-seq data preprocessing

We can perform FASTQ quality control, read alignment, and read quantification
Find gene fusions from RNA-seq
Find genome variations and indels from RNA-seq (including initial read mapping, local realignment around indels, base quality score recalibration, SNP discovery and genotyping to find all potential variants)
Detect differentially expressed genes (DEGs)

Identification of differentially expressed genes (DEGs)

Based on comparison of two datasets – experiment vs. control

We can calculate DEGs from your raw or pre-processed transcriptomics data coming in the following formats:

Raw RNA-seq data (FASTQ files)
Raw counts table with Illumina genes derived from RNA-seq
Normalized data (logFC values)
Affymetrix probeset IDs (microarray data)
Illumina probes (microarray data)
Agilent probes (microarray data)

Identification of genome variants and indels from NGS data

From RNA-seq data or exome / whole genome NGS data

This service includes discovery of genotype variations in the input dataset. The analysis process includes initial read mapping to the reference genome, local realignment around indels, base quality score recalibration, SNP discovery and genotyping to find all potential variants. Further analysis of identified variants can be performed: we can point out to those TFBSs which are enriched around the variations and bring up the list of transcription factors that are responsible for the regulation of the corresponding promoters.

SNP lists analysis

Within this service SNPs will be matched on transcriptional level: variant effects on transcript level of exons will be predicted; search for transcription factor binding sites (TFBS), which may be affected by genomic variations (SNPs), will be performed. Variant effects on protein functions will be predicted, potential pathway alterations will be reported.

This service is provided for human SNPs coming from hg19 or hg38 genome builds.

Prediction of miRNA binding sites in transcript regions of genes

The miRNAs post-transcriptionally affect (mostly repress) the expression of protein-coding genes. The human genome encodes over 1000 miRNA genes that collectively target the vast majority of messenger RNAs (mRNAs).

We can predict the miRNA binding sites starting from a gene list: we will create a collection of 3’ untranslated regions (3’ UTRs) and will use it to map against a miRmap library derived from the mirBase database. We will then apply the miRmap method and will provide you with the resulting table of all predicted miRNA binding sites and a genome browser visualization of the created sequence collection.

Optionally, analysis can be performed in transcript regions of genes, which are expressed in a specific tissue (expression info will be taken from Human Protein Atlas).

The following 61 tissues are currently supported:

B-lymphocytes, dendritic cells, granulocytes, monocytes, NK cell, PBMC, T-lymphocytes, appendix, bone marrow, lymph node, spleen, thymus, tonsil, amygdala, basal ganglia, cerebellum, cerebral cortex, corpus callosum, hippocampus, hypothalamus, pons and medulla, midbrain, olfactory bulb, spinal cord, thalamus, cervix, endometrium, ovary, oviducts, placenta, vagina, adipose tissue/fat, mammary gland/breast, adrenal, parathyroid, pituitary gland, thyroid, retina, colon/large intestine, duodenum, rectum, small intestine, stomach, bladder, kidney, gallbladder, liver, epididymis, prostate, seminal vesicle, testis, vas deference, heart, skeletal muscle, smooth muscle, pancreas, esophagus, salivary gland, tongue, lung, skin.

miRNA genes promoters analysis

Identification of enriched motifs in cell line or tissue specific miRNA genes

We will identify enriched transcription factor binding sites in miRNA gene promoters from your list of miRNAs.

The miRNA promoter information will be taken from MiRProm database. Different human cell lines and tissues are supported (see below). The constructed promoters track will be compared to a background set of randomly selected miRNAs from MiRBase database with the same selected tissue / cell line specifications. The constructed tracks will be analyzed for finding enriched transcription factor binding sites (TFBSs) using the MATCH tool of TRANSFAC database.

The service supports a number of different human cell lines and tissues.

The following 54 cell lines are supported:

GM12864, hESCT0, HUVEC, MCF7, AG10803, GM06990, Hela, AG04449, AG09309, CACO2, CD14, HCT116, HEEpiC, HFF_MyC, A549, AG04450, AG09319, AoAF, BJ, CD20, H7_hESC_T14, H7_hESC_T5, HAc, HFF, HPAF, Jurkat, K562, NHDF_Neo, NHLF, RPTEC, SAEC, SKNSH, SK_N_MC, WERI_Rb1, WI_38, WI_38_TAM, GM12865, NB4, HRE, HRPEpiC, HL60, HBMEC, HPF, PANC1, HMEC, HVMF, HMF, HCPEpiC, NHEK, HAsp, HCM, GM12878.

The following 20 tissue types are supported:

blood, blood vessels, brain, cartilage, cervix, choroid plexus, colon, embryonic cells, esophagus, eye, foreskin, gingiva, heart, kidney, lung, mammary gland/breast, pancreas, placenta, skin, spinal cord.

Master regulators identification

Upstream Analysis – integrated promoter and pathway analysis

We can perform integrated promoter and pathway analysis for identification of the molecular mechanism of the studied process. Upstream analysis comprises of two steps: first, the promoters of the genes characterizing the studied process (target genes) are retrieved and analyzed for potential transcription factor binding sites with the use of positional weight matrices from the TRANSFAC database; next, the site search result is converted into a table of transcription factors (TFs) that potentially regulate the target genes, and pathways are reconstructed with information about all relevant signaling cascades from the TRANSPATH database that are known to activate the previously hypothesized TFs. Molecules where these pathways converge are considered as potential master regulators of the biological process under study. Depending on the task, master regulators can serve as a basis for identification of prospective biomarkers and drug targets.

This service is available for the following species: Human (Homo sapiens), Mouse (Mus musculus), Rat (Rattus norvegicus), with possible extension to other mammalian species upon request.

Identification of TFBS affected by genomic variations

Enriched TF sites around regulatory SNPs and SIFT analysis

Within this service we can provide you with information on which genes and gene products are affected and might be even damaged by a given list of variations.

We will map the SNPs to genomic positions and will perform the SIFT analysis to check whether a particular variation is synonymous or non-synonymous, and in case of a non-synonymous variation whether it is damaging or tolerated.

This service is provided for human data coming from hg19 or hg38 genome builds.

Mutation effect on sites analysis

Within this service we will find transcription factor binding sites (TFBSs) affected by variations or mutations. We will calculate the arithmetical difference between TFBS score in the reference genome and the TFBS score at the same position with a variation (in the alternative sequence) and based on that score the effect of mutation of the binding site will be estimated. The score difference can be positive or negative. A positive score indicates a disrupted site (site loss), as it means that the given TFBS had a better score in the reference sequence, and it was decreased by the variation; a negative score predicts a new site (site gain), as it means that the given TFBS has a better score in the sequence with the variation as compared to the reference sequence.

Gene set enrichment analysis (GSEA)

Gene set enrichment analysis (GSEA) can be performed on a ranked set of genes (ranking is done by a numerical column specified by you or pre-calculated by us from your data, Fold-change values e.g.).

Analysis result will include the identified groups of genes with calculated p-values and respective hits info (genes from the input set that matched to the group), as well as the plot showing how Kolmogorov-Smirnov score depends on gene rank.

The following classifications can be selected for this service:

Full gene ontology classification
GO (biological process)
GO (molecular function)
GO (cellular component HumanCyc pathways)
HumanPSD™ disease
Reactome pathways (metabolic pathway annotation)
TFs classification
TRANSPATH Pathways
Custom classification

Functional classification of target genes or proteins

This service will map your target genes list to any of the following ontologies:

Gene Ontology

biological process
cellular component
molecular function

Transcription factor classification (TFclass)

TRANSPATH® pathways

Reactome pathways

HumanCyc pathways

HumanPSD™ disease

Identification of functional clusters of target genes or proteins

Within this service we will apply the Cluster by shortest path method on your input gene or protein set and will report the identified gene or protein clusters with associated visualizations. The clusters can be searched for upstream or downstream from the input list of genes or proteins by taking reactions and all intermediate molecules using shortest paths from a specified search collection. The supported collections include TRANSPATH and Reactome pathways, custom search collection containing the needed reactions can be applied.

Principal Component Analysis (PCA)

PCA is a statistical method that transforms data in a way, so that a maximum amount of variance within the data can be expressed in fewer or, at most, as many dimensions as the original data. The new dimensions onto which data are projected are the principal components. They capture the original variance in decreasing order, so that the first principal component presents most of the variance. PCA is often used reduce the complexity of (to compress) or to identify groups in high-dimensional data.

We can perform PCA analysis on your numerical data (e.g. normalized microarray measurements) and provide you with the PCA analysis report that will include the PCA scatter plat showing the items of specified groups at their transformed coordinates according to the first two principal components. The entire set of coordinates will be also provided. The relative importance of each principal component will be reported with respect to the proportion of explained variance.

Construction of heatmaps

We can create heatmap visualizations for your data. Heatmaps are graphical representations of data where the individual values contained in a matrix are represented as colors. Heatmaps can be used for extraction of subsets of correlated rows or columns revealed by the hierarchical clustering and/or the heatmap presentation (e.g. gene expression levels from RNA-seq data can be visualized as a heatmap).

Cookie	Duration	Description
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.