Using Galaxy-P to leverage RNA-Seq for the discovery of novel protein variations

Gloria M. Sheynkman; James E. Johnson; Pratik D. Jagtap; Michael R. Shortreed; Getiria Onsongo; Brian L. Frey; Timothy J. Griffin; Lloyd M. Smith

doi:10.1186/1471-2164-15-703

Using Galaxy-P to leverage RNA-Seq for the discovery of novel protein variations

Gloria M. Sheynkman, James E. Johnson, Pratik D. Jagtap, Michael R. Shortreed, Getiria Onsongo, Brian L. Frey, Timothy J. Griffin, Lloyd M. Smith

Research output: Contribution to journal › Article › peer-review

71 Scopus citations

Abstract

Background: Current practice in mass spectrometry (MS)-based proteomics is to identify peptides by comparison of experimental mass spectra with theoretical mass spectra derived from a reference protein database; however, this strategy necessarily fails to detect peptide and protein sequences that are absent from the database. We and others have recently shown that customized proteomic databases derived from RNA-Seq data can be employed for MS-searching to both improve MS analysis and identify novel peptides. While this general strategy constitutes a significant advance for the discovery of novel protein variations, it has not been readily transferable to other laboratories due to the need for many specialized software tools. To address this problem, we have implemented readily accessible, modifiable, and extensible workflows within Galaxy-P, short for Galaxy for Proteomics, a web-based bioinformatic extension of the Galaxy framework for the analysis of multi-omics (e.g. genomics, transcriptomics, proteomics) data. Results: We present three bioinformatic workflows that allow the user to upload raw RNA sequencing reads and convert the data into high-quality customized proteomic databases suitable for MS searching. We show the utility of these workflows on human and mouse samples, identifying 544 peptides containing single amino acid polymorphisms (SAPs) and 187 peptides corresponding to unannotated splice junction peptides, correlating protein and transcript expression levels, and providing the option to incorporate transcript abundance measures within the MS database search process (reduced databases, incorporation of transcript abundance for protein identification score calculations, etc.). Conclusions: Using RNA-Seq data to enhance MS analysis is a promising strategy to discover novel peptides specific to a sample and, more generally, to improve proteomics results. The main bottleneck for widespread adoption of this strategy has been the lack of easily used and modifiable computational tools. We provide a solution to this problem by introducing a set of workflows within the Galaxy-P framework that converts raw RNA-Seq data into customized proteomic databases.

Original language	English (US)
Article number	703
Journal	BMC Genomics
Volume	15
Issue number	1
DOIs	https://doi.org/10.1186/1471-2164-15-703
State	Published - 2014

Bibliographical note

Publisher Copyright:
© 2014 Sheynkman et al.

Access

10.1186/1471-2164-15-703

OpenUrl availability

Full text

Cite this

@article{dbf2a2761c134094a30e27136f36bd96,

title = "Using Galaxy-P to leverage RNA-Seq for the discovery of novel protein variations",

abstract = "Background: Current practice in mass spectrometry (MS)-based proteomics is to identify peptides by comparison of experimental mass spectra with theoretical mass spectra derived from a reference protein database; however, this strategy necessarily fails to detect peptide and protein sequences that are absent from the database. We and others have recently shown that customized proteomic databases derived from RNA-Seq data can be employed for MS-searching to both improve MS analysis and identify novel peptides. While this general strategy constitutes a significant advance for the discovery of novel protein variations, it has not been readily transferable to other laboratories due to the need for many specialized software tools. To address this problem, we have implemented readily accessible, modifiable, and extensible workflows within Galaxy-P, short for Galaxy for Proteomics, a web-based bioinformatic extension of the Galaxy framework for the analysis of multi-omics (e.g. genomics, transcriptomics, proteomics) data. Results: We present three bioinformatic workflows that allow the user to upload raw RNA sequencing reads and convert the data into high-quality customized proteomic databases suitable for MS searching. We show the utility of these workflows on human and mouse samples, identifying 544 peptides containing single amino acid polymorphisms (SAPs) and 187 peptides corresponding to unannotated splice junction peptides, correlating protein and transcript expression levels, and providing the option to incorporate transcript abundance measures within the MS database search process (reduced databases, incorporation of transcript abundance for protein identification score calculations, etc.). Conclusions: Using RNA-Seq data to enhance MS analysis is a promising strategy to discover novel peptides specific to a sample and, more generally, to improve proteomics results. The main bottleneck for widespread adoption of this strategy has been the lack of easily used and modifiable computational tools. We provide a solution to this problem by introducing a set of workflows within the Galaxy-P framework that converts raw RNA-Seq data into customized proteomic databases.",

author = "Sheynkman, {Gloria M.} and Johnson, {James E.} and Jagtap, {Pratik D.} and Shortreed, {Michael R.} and Getiria Onsongo and Frey, {Brian L.} and Griffin, {Timothy J.} and Smith, {Lloyd M.}",

note = "Publisher Copyright: {\textcopyright} 2014 Sheynkman et al.",

year = "2014",

doi = "10.1186/1471-2164-15-703",

language = "English (US)",

volume = "15",

journal = "BMC Genomics",

issn = "1471-2164",

publisher = "BioMed Central",

number = "1",

}

TY - JOUR

T1 - Using Galaxy-P to leverage RNA-Seq for the discovery of novel protein variations

AU - Sheynkman, Gloria M.

AU - Johnson, James E.

AU - Jagtap, Pratik D.

AU - Shortreed, Michael R.

AU - Onsongo, Getiria

AU - Frey, Brian L.

AU - Griffin, Timothy J.

AU - Smith, Lloyd M.

PY - 2014

Y1 - 2014

N2 - Background: Current practice in mass spectrometry (MS)-based proteomics is to identify peptides by comparison of experimental mass spectra with theoretical mass spectra derived from a reference protein database; however, this strategy necessarily fails to detect peptide and protein sequences that are absent from the database. We and others have recently shown that customized proteomic databases derived from RNA-Seq data can be employed for MS-searching to both improve MS analysis and identify novel peptides. While this general strategy constitutes a significant advance for the discovery of novel protein variations, it has not been readily transferable to other laboratories due to the need for many specialized software tools. To address this problem, we have implemented readily accessible, modifiable, and extensible workflows within Galaxy-P, short for Galaxy for Proteomics, a web-based bioinformatic extension of the Galaxy framework for the analysis of multi-omics (e.g. genomics, transcriptomics, proteomics) data. Results: We present three bioinformatic workflows that allow the user to upload raw RNA sequencing reads and convert the data into high-quality customized proteomic databases suitable for MS searching. We show the utility of these workflows on human and mouse samples, identifying 544 peptides containing single amino acid polymorphisms (SAPs) and 187 peptides corresponding to unannotated splice junction peptides, correlating protein and transcript expression levels, and providing the option to incorporate transcript abundance measures within the MS database search process (reduced databases, incorporation of transcript abundance for protein identification score calculations, etc.). Conclusions: Using RNA-Seq data to enhance MS analysis is a promising strategy to discover novel peptides specific to a sample and, more generally, to improve proteomics results. The main bottleneck for widespread adoption of this strategy has been the lack of easily used and modifiable computational tools. We provide a solution to this problem by introducing a set of workflows within the Galaxy-P framework that converts raw RNA-Seq data into customized proteomic databases.

AB - Background: Current practice in mass spectrometry (MS)-based proteomics is to identify peptides by comparison of experimental mass spectra with theoretical mass spectra derived from a reference protein database; however, this strategy necessarily fails to detect peptide and protein sequences that are absent from the database. We and others have recently shown that customized proteomic databases derived from RNA-Seq data can be employed for MS-searching to both improve MS analysis and identify novel peptides. While this general strategy constitutes a significant advance for the discovery of novel protein variations, it has not been readily transferable to other laboratories due to the need for many specialized software tools. To address this problem, we have implemented readily accessible, modifiable, and extensible workflows within Galaxy-P, short for Galaxy for Proteomics, a web-based bioinformatic extension of the Galaxy framework for the analysis of multi-omics (e.g. genomics, transcriptomics, proteomics) data. Results: We present three bioinformatic workflows that allow the user to upload raw RNA sequencing reads and convert the data into high-quality customized proteomic databases suitable for MS searching. We show the utility of these workflows on human and mouse samples, identifying 544 peptides containing single amino acid polymorphisms (SAPs) and 187 peptides corresponding to unannotated splice junction peptides, correlating protein and transcript expression levels, and providing the option to incorporate transcript abundance measures within the MS database search process (reduced databases, incorporation of transcript abundance for protein identification score calculations, etc.). Conclusions: Using RNA-Seq data to enhance MS analysis is a promising strategy to discover novel peptides specific to a sample and, more generally, to improve proteomics results. The main bottleneck for widespread adoption of this strategy has been the lack of easily used and modifiable computational tools. We provide a solution to this problem by introducing a set of workflows within the Galaxy-P framework that converts raw RNA-Seq data into customized proteomic databases.

UR - http://www.scopus.com/inward/record.url?scp=84988835216&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84988835216&partnerID=8YFLogxK

U2 - 10.1186/1471-2164-15-703

DO - 10.1186/1471-2164-15-703

M3 - Article

C2 - 25149441

AN - SCOPUS:84988835216

SN - 1471-2164

VL - 15

JO - BMC Genomics

JF - BMC Genomics

IS - 1

M1 - 703

ER -

Using Galaxy-P to leverage RNA-Seq for the discovery of novel protein variations

Abstract

Bibliographical note

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this