Protein Genome Map
In the geneXplain platform, a comprehensive set of methods and workflows elucidate functional effects of genomic variants. The Protein Genome Map is a database of genomic coordinates of protein functional features derived from high quality, manual curation in Transpath® [1] as well as additional sources. With this novel resource, curated information now becomes available for functional genomics investigations such as variant effect analysis.
The database can be queried using the accompanying method “Create protein feature track”. The method extracts genomic intervals encoding protein functional features that overlap genome coordinates of interest into a platform track which can then be exported or further analyzed with a portfolio of platform tools and workflows.
The first version of the database focuses on post-translational modifications (PTMs) and contains 234192 PTM sites for 18 different modification types in 973 Drosophila, 23630 human, 5414 mouse and 39 rat protein isoforms from Transpath® and BioGRID [2, 3] as well as their genomic locations in assemblies BDGP6.54, GRCh38, GRCm39 and GRCr8. The modification types and their presentation in the database are described in the table below.
|
PTM type 9003241321134286_5c4861-a8> |
Transpath® 9003241321134286_a67ab0-bc> |
BioGRID 9003241321134286_025f50-8c> |
Human – Transpath® 9003241321134286_81a442-97> |
Human – BioGRID 9003241321134286_679d6a-d5> |
|
Acetylation 9003241321134286_e1ddfa-cf> |
165 9003241321134286_ec76f4-b4> | 9003241321134286_9bb510-1f> |
146 9003241321134286_354bb3-63> | 9003241321134286_0df4bb-ea> |
|
Acylation 9003241321134286_fb68d6-0b> |
2 9003241321134286_7fdd7f-72> | 9003241321134286_0aa21f-4d> |
2 9003241321134286_dd8b48-78> | 9003241321134286_137de8-22> |
|
Glycosylation 9003241321134286_829185-f2> |
30 9003241321134286_e9b11b-05> | 9003241321134286_39ed2b-12> |
27 9003241321134286_870467-20> | 9003241321134286_f18476-35> |
|
Hydroxylation 9003241321134286_fe730c-6a> |
7 9003241321134286_efc1b4-47> | 9003241321134286_d63dee-36> |
7 9003241321134286_2be725-11> | 9003241321134286_f48582-a1> |
|
ISGylation (ISG15) 9003241321134286_e1720b-b6> |
3 9003241321134286_f953e5-9d> | 9003241321134286_c7f30e-d4> |
3 9003241321134286_a45889-ad> | 9003241321134286_32476f-a2> |
|
Methylation 9003241321134286_6f129b-f9> |
59 9003241321134286_f94136-bb> | 9003241321134286_013991-08> |
55 9003241321134286_7f0df5-79> | 9003241321134286_a21ada-ad> |
|
Myristoylation 9003241321134286_8c4a11-e6> |
5 9003241321134286_1add60-68> | 9003241321134286_373d00-25> |
4 9003241321134286_f4c3a5-3a> | 9003241321134286_2fc0cc-b9> |
|
Neddylation (Nedd8) 9003241321134286_e81a57-e2> |
2 9003241321134286_24f0b5-5b> |
1750 9003241321134286_01e4ce-ec> |
2 9003241321134286_cfb812-05> |
1750 9003241321134286_0fb0db-cb> |
|
Nitrosylation 9003241321134286_e5e466-3d> |
5 9003241321134286_80f72b-f0> | 9003241321134286_fbd37c-4c> |
4 9003241321134286_a48f23-ac> | 9003241321134286_4b3d07-e3> |
|
Palmitoylation 9003241321134286_9116b4-66> |
5 9003241321134286_673361-7d> | 9003241321134286_0facf8-05> |
1 9003241321134286_c1c6e0-e4> | 9003241321134286_50947e-f6> |
|
Serine phosphorylation 9003241321134286_3c431c-fd> |
1612 9003241321134286_d75bbf-7e> |
19536 9003241321134286_53cba0-a0> |
1373 9003241321134286_4f459b-2d> |
19536 9003241321134286_884ab2-56> |
|
Sulfation 9003241321134286_a0ac23-88> |
1 9003241321134286_eda485-73> | 9003241321134286_791938-b7> |
1 9003241321134286_c604b5-fc> | 9003241321134286_4cc604-da> |
|
Sumoylation (any) 9003241321134286_f2a242-82> | 9003241321134286_2e7972-77> |
217 9003241321134286_808a96-8c> | 9003241321134286_2f8c5d-4c> | 9003241321134286_1796af-af> |
|
Sumoylation (SUMO1) 9003241321134286_76193f-9d> |
119 9003241321134286_7d7530-9f> | 9003241321134286_c02160-17> |
84 9003241321134286_a40fd8-4a> | 9003241321134286_a1657e-ba> |
|
Sumoylation (SUMO2) 9003241321134286_554f13-9f> |
33 9003241321134286_1ccd46-31> | 9003241321134286_3e67f4-82> |
24 9003241321134286_6da942-3c> | 9003241321134286_1de2ea-b1> |
|
Sumoylation (SUMO3) 9003241321134286_85199e-57> |
20 9003241321134286_525839-42> | 9003241321134286_e0acd1-56> |
11 9003241321134286_9b8382-98> | 9003241321134286_7110e4-21> |
|
Threonine phosphorylation 9003241321134286_84364b-7e> |
450 9003241321134286_c8c567-3a> |
3813 9003241321134286_e995c2-05> |
356 9003241321134286_314612-c1> |
3813 9003241321134286_d74be1-ab> |
|
Tyrosine phosphorylation 9003241321134286_864733-03> |
315 9003241321134286_1b9e99-a6> |
520 9003241321134286_b647cb-88> |
257 9003241321134286_7f6ac0-55> |
520 9003241321134286_ffd044-1a> |
|
Ubiquitination 9003241321134286_d5b8dd-ae> |
134 9003241321134286_978063-d2> |
205666 9003241321134286_3c45c9-7d> |
109 9003241321134286_948cfe-33> |
191351 9003241321134286_667f29-be> |
References
1. Krull M, Voss N, Choi C, Pistor S, Potapov A, Wingender E. TRANSPATH: an integrated database on signal transduction and a tool for array analysis. Nucleic Acids Res. 2003 Jan 1;31(1):97-100. doi: 10.1093/nar/gkg089. PMID: 12519957; PMCID: PMC165536.
2. Stark C, Breitkreutz BJ, Reguly T, Boucher L, Breitkreutz A, Tyers M. BioGRID: a general repository for interaction datasets. Nucleic Acids Res. 2006 Jan 1;34(Database issue):D535-9. doi: 10.1093/nar/gkj109. PMID: 16381927; PMCID: PMC1347471.3. Oughtred R, Rust J, Chang C, Breitkreutz BJ, Stark C, Willems A, Boucher L, Leung G, Kolas N, Zhang F, Dolma S, Coulombe-Huntington J, Chatr-Aryamontri A, Dolinski K, Tyers M. The BioGRID database: A comprehensive biomedical resource of curated protein, genetic, and chemical interactions. Protein Sci. 2021 Jan;30(1):187-200. doi: 10.1002/pro.3978. Epub 2020 Nov 23. PMID: 33070389; PMCID: PMC7737760.
Example analysis
The method “Create protein feature track” extracts protein functional features encoded in genomic regions of interest from a Protein Genome Map database.To demonstrate a possible use case, we prepare a BED file with human genome regions containing the signal transduction receptor genes EGFR, TGFBR1 and TGFBR2. The contents of the BED file are shown below. We name the file grch38_receptor.bed and import it into a data folder of the platform.
grch38_receptor.bed|
chr7 9003241321134286_1e5ec4-67> |
55018819 9003241321134286_0eb157-90> |
55211628 9003241321134286_87149a-cb> |
EGFR 9003241321134286_c27bc1-99> |
1000 9003241321134286_d4e704-85> |
|
chr9 9003241321134286_718715-c4> |
99104037 9003241321134286_8c830f-a4> |
99154192 9003241321134286_8709d5-55> |
TGFBR1 9003241321134286_06b036-1c> |
1000 9003241321134286_88f5b9-ca> |
|
chr3 9003241321134286_aea086-50> |
30606477 9003241321134286_a4b85b-b4> |
30694249 9003241321134286_f184f3-fb> |
TGFBR2 9003241321134286_29d9a5-0b> |
1000 9003241321134286_8eb954-f5> |
Data import
We use the following steps to import the data into the platform.
- Select target data folder
- Click the import button
- Select the file and proceed with import
- Select the reference genome (Ensembl 112.38)
- Click Start to import

Protein feature track extraction
To create the track of protein features encoded in the receptor regions, we use the following steps.
- Open the “Create protein feature track” tool
- Select the input track
- Adjust Protein Genome Map database, the mapped genome and feature types to extract
- Adjust the output path if necessary
- Click Run to start the analysis

When analysis is finished, the resulting output track is opened in the genome viewer.

Via the context menu of the output track, one can also open the corresponding data table to inspect details of the extracted features.

