Protein Genome Map

In the geneXplain platform, a comprehensive set of methods and workflows elucidate functional effects of genomic variants. The Protein Genome Map is a database of genomic coordinates of protein functional features derived from high quality, manual curation in Transpath® [1] as well as additional sources. With this novel resource, curated information now becomes available for functional genomics investigations such as variant effect analysis.

The database can be queried using the accompanying method “Create protein feature track”. The method extracts genomic intervals encoding protein functional features that overlap genome coordinates of interest into a platform track which can then be exported or further analyzed with a portfolio of platform tools and workflows.

The first version of the database focuses on post-translational modifications (PTMs) and contains 234192 PTM sites for 18 different modification types in 973 Drosophila, 23630 human, 5414 mouse and 39 rat protein isoforms from Transpath® and BioGRID [2, 3] as well as their genomic locations in assemblies BDGP6.54, GRCh38, GRCm39 and GRCr8. The modification types and their presentation in the database are described in the table below.

PTM type

Transpath®

BioGRID

Human – Transpath®

Human – BioGRID

Acetylation

165

146

Acylation

2

2

Glycosylation

30

27

Hydroxylation

7

7

ISGylation (ISG15)

3

3

Methylation

59

55

Myristoylation

5

4

Neddylation (Nedd8)

2

1750

2

1750

Nitrosylation

5

4

Palmitoylation

5

1

Serine phosphorylation

1612

19536

1373

19536

Sulfation

1

1

Sumoylation (any)

217

Sumoylation (SUMO1)

119

84

Sumoylation (SUMO2)

33

24

Sumoylation (SUMO3)

20

11

Threonine phosphorylation

450

3813

356

3813

Tyrosine phosphorylation

315

520

257

520

Ubiquitination

134

205666

109

191351

References

1. Krull M, Voss N, Choi C, Pistor S, Potapov A, Wingender E. TRANSPATH: an integrated database on signal transduction and a tool for array analysis. Nucleic Acids Res. 2003 Jan 1;31(1):97-100. doi: 10.1093/nar/gkg089. PMID: 12519957; PMCID: PMC165536.

2. Stark C, Breitkreutz BJ, Reguly T, Boucher L, Breitkreutz A, Tyers M. BioGRID: a general repository for interaction datasets. Nucleic Acids Res. 2006 Jan 1;34(Database issue):D535-9. doi: 10.1093/nar/gkj109. PMID: 16381927; PMCID: PMC1347471.3. Oughtred R, Rust J, Chang C, Breitkreutz BJ, Stark C, Willems A, Boucher L, Leung G, Kolas N, Zhang F, Dolma S, Coulombe-Huntington J, Chatr-Aryamontri A, Dolinski K, Tyers M. The BioGRID database: A comprehensive biomedical resource of curated protein, genetic, and chemical interactions. Protein Sci. 2021 Jan;30(1):187-200. doi: 10.1002/pro.3978. Epub 2020 Nov 23. PMID: 33070389; PMCID: PMC7737760.

Example analysis

The method “Create protein feature track” extracts protein functional features encoded in genomic regions of interest from a Protein Genome Map database.To demonstrate a possible use case, we prepare a BED file with human genome regions containing the signal transduction receptor genes EGFR, TGFBR1 and TGFBR2. The contents of the BED file are shown below. We name the file grch38_receptor.bed and import it into a data folder of the platform.

grch38_receptor.bed

chr7

55018819

55211628

EGFR

1000

chr9

99104037

99154192

TGFBR1

1000

chr3

30606477

30694249

TGFBR2

1000

Data import

We use the following steps to import the data into the platform.

  1. Select target data folder
  2. Click the import button
  3. Select the file and proceed with import
  4. Select the reference genome (Ensembl 112.38)
  5. Click Start to import
steps to import the data into the platform

Protein feature track extraction

To create the track of protein features encoded in the receptor regions, we use the following steps.

  1. Open the “Create protein feature track” tool
  2. Select the input track
  3. Adjust Protein Genome Map database, the mapped genome and feature types to extract
  4. Adjust the output path if necessary
  5. Click Run to start the analysis
protein feature track extraction

When analysis is finished, the resulting output track is opened in the genome viewer.

resulting output track

Via the context menu of the output track, one can also open the corresponding data table to inspect details of the extracted features.

context menu of the output track