Precursor intensity-based label-free quantification software tools for proteomic and multi-omic analysis within the galaxy platform

Subina Mehta; Caleb W. Easterly; Ray Sajulga; Robert J. Millikin; Andrea Argentini; Ignacio Eguinoa; Lennart Martens; Michael R. Shortreed; Lloyd M. Smith; Thomas McGowan; Praveen Kumar; James E. Johnson; Timothy J. Griffin; Pratik D. Jagtap

doi:10.3390/PROTEOMES8030015

Precursor intensity-based label-free quantification software tools for proteomic and multi-omic analysis within the galaxy platform

Subina Mehta, Caleb W. Easterly, Ray Sajulga, Robert J. Millikin, Andrea Argentini, Ignacio Eguinoa, Lennart Martens, Michael R. Shortreed, Lloyd M. Smith, Thomas McGowan, Praveen Kumar, James E. Johnson, Timothy J. Griffin, Pratik D. Jagtap

Research output: Contribution to journal › Article › peer-review

9 Scopus citations

Abstract

For mass spectrometry-based peptide and protein quantification, label-free quantification (LFQ) based on precursor mass peak (MS1) intensities is considered reliable due to its dynamic range, reproducibility, and accuracy. LFQ enables peptide-level quantitation, which is useful in proteomics (analyzing peptides carrying post-translational modifications) and multi-omics studies such as metaproteomics (analyzing taxon-specific microbial peptides) and proteogenomics (analyzing non-canonical sequences). Bioinformatics workflows accessible via the Galaxy platform have proven useful for analysis of such complex multi-omic studies. However, workflows within the Galaxy platform have lacked well-tested LFQ tools. In this study, we have evaluated moFF and FlashLFQ, two open-source LFQ tools, and implemented them within the Galaxy platform to offer access and use via established workflows. Through rigorous testing and communication with the tool developers, we have optimized the performance of each tool. Software features evaluated include: (a) match-between-runs (MBR); (b) using multiple file-formats as input for improved quantification; (c) use of containers and/or conda packages; (d) parameters needed for analyzing large datasets; and (e) optimization and validation of software performance. This work establishes a process for software implementation, optimization, and validation, and offers access to two robust software tools for LFQ-based analysis within the Galaxy platform.

Original language	English (US)
Article number	15
Journal	Proteomes
Volume	8
Issue number	8
DOIs	https://doi.org/10.3390/PROTEOMES8030015
State	Published - Sep 2020

Bibliographical note

Funding Information:
Acknowledgments: We would like to thank the European Galaxy team for the help in the support during Galaxy implementation. We would also like to thank Carlo Horro (from Barnes Group, University of Bergen, Norway) and Björn A. Grüning (University of Freiburg, Germany) for helping us during the quantification tools analysis. We thank Emma Leith for proofreading the manuscript. We acknowledge funding for this work from the grant We also acknowledge the support from the Minnesota Supercomputing Institute for the maintenance and update of the Galaxy instances. Data Availability: All the data files used for this study are uploaded onto a Zenodo repository at https://doi.org/10.5281/zenodo.3733904. We have also provided the input and output files of our data analysis. Supplementary Document 2 (https://github.com/galaxyproteomics/quant-tools-analysis) is the GitHub repository of the Rscripts. The original dataset for UPS study is available via ProteomeXchange identifier-PXD000279 (spiked-in Universal Proteomic Standard).

Funding Information:
Funding: This research was funded by National Cancer Institute-Informatics Technology for Cancer Research (NCI-ITCR) grant 1U24CA199347 and National Science Foundation (U.S.) grant 1458524 to T.G. We would also like to acknowledge the Extreme Science and Engineering Discovery Environment (XSEDE) research allocation BIO170096 to P.D.J. and use of the Jetstream cloud-based computing resource for scientific computing (https://jetstream-cloud.org/) maintained at Indiana University. The European Galaxy server that was used for some calculations is in part funded by Collaborative Research Centre 992 Medical Epigenetics (DFG grant SFB 992/1 2012) and German Federal Ministry of Education and Research (BMBF grants 031 A538A/A538C RBC, 031L0101B/031L0101C de.NBI-epi, 031L0106 de.STAIR (de.NBI)). Part of the work was performed by the Belgian ELIXIR node, also hosting the tools at the Belgian Galaxy instance, which is funded by the Research Foundation, Flanders (FWO) grant I002919N.

Funding Information:
This research was funded by National Cancer Institute-Informatics Technology for Cancer Research (NCI-ITCR) grant 1U24CA199347 and National Science Foundation (U.S.) grant 1458524 to T.G. We would also like to acknowledge the Extreme Science and Engineering Discovery Environment (XSEDE) research allocation BIO170096 to P.D.J. and use of the Jetstream cloud-based computing resource for scientific computing (https://jetstream-cloud.org/) maintained at Indiana University. The European Galaxy server that was used for some calculations is in part funded by Collaborative Research Centre 992 Medical Epigenetics (DFG grant SFB 992/1 2012) and German Federal Ministry of Education and Research (BMBF grants 031 A538A/A538C RBC, 031L0101B/031L0101C de.NBI-epi, 031L0106 de.STAIR (de.NBI). Part of the work was performed by the Belgian ELIXIR node, also hosting the tools at the Belgian Galaxy instance, which is funded by the Research Foundation, Flanders (FWO) grant I002919N. We would like to thank the European Galaxy team for the help in the support during Galaxy implementation. We would also like to thank Carlo Horro (from Barnes Group, University of Bergen, Norway) and Bj?rn A. Gr?ning (University of Freiburg, Germany) for helping us during the quantification tools analysis. We thank Emma Leith for proofreading the manuscript. We acknowledge funding for this work from the grantWe also acknowledge the support from the Minnesota Supercomputing Institute for the maintenance and update of the Galaxy instances. Data Availability: All the data files used for this study are uploaded onto a Zenodo repository at https://doi.org/10.5281/zenodo.3733904. We have also provided the input and output files of our data analysis. Supplementary Document 2 (https://github.com/galaxyproteomics/quant-tools-analysis) is the GitHub repository of the Rscripts. The original dataset for UPS study is available via ProteomeXchange identifier-PXD000279 (spiked-in Universal Proteomic Standard).

Publisher Copyright:
© 2020 by the authors.

Keywords

Galaxy framework
Label-free quantification
Proteomics
Workflows

Access

10.3390/PROTEOMES8030015

OpenUrl availability

Full text

Cite this

Mehta, S., Easterly, C. W., Sajulga, R., Millikin, R. J., Argentini, A., Eguinoa, I., Martens, L., Shortreed, M. R., Smith, L. M., McGowan, T., Kumar, P., Johnson, J. E., Griffin, T. J., & Jagtap, P. D. (2020). Precursor intensity-based label-free quantification software tools for proteomic and multi-omic analysis within the galaxy platform. Proteomes, 8(8), Article 15. https://doi.org/10.3390/PROTEOMES8030015

Mehta, S, Easterly, CW, Sajulga, R, Millikin, RJ, Argentini, A, Eguinoa, I, Martens, L, Shortreed, MR, Smith, LM, McGowan, T, Kumar, P, Johnson, JE, Griffin, TJ & Jagtap, PD 2020, 'Precursor intensity-based label-free quantification software tools for proteomic and multi-omic analysis within the galaxy platform', Proteomes, vol. 8, no. 8, 15. https://doi.org/10.3390/PROTEOMES8030015

@article{21f6041ea8704f62affb94b3d4faad52,

title = "Precursor intensity-based label-free quantification software tools for proteomic and multi-omic analysis within the galaxy platform",

abstract = "For mass spectrometry-based peptide and protein quantification, label-free quantification (LFQ) based on precursor mass peak (MS1) intensities is considered reliable due to its dynamic range, reproducibility, and accuracy. LFQ enables peptide-level quantitation, which is useful in proteomics (analyzing peptides carrying post-translational modifications) and multi-omics studies such as metaproteomics (analyzing taxon-specific microbial peptides) and proteogenomics (analyzing non-canonical sequences). Bioinformatics workflows accessible via the Galaxy platform have proven useful for analysis of such complex multi-omic studies. However, workflows within the Galaxy platform have lacked well-tested LFQ tools. In this study, we have evaluated moFF and FlashLFQ, two open-source LFQ tools, and implemented them within the Galaxy platform to offer access and use via established workflows. Through rigorous testing and communication with the tool developers, we have optimized the performance of each tool. Software features evaluated include: (a) match-between-runs (MBR); (b) using multiple file-formats as input for improved quantification; (c) use of containers and/or conda packages; (d) parameters needed for analyzing large datasets; and (e) optimization and validation of software performance. This work establishes a process for software implementation, optimization, and validation, and offers access to two robust software tools for LFQ-based analysis within the Galaxy platform.",

keywords = "Galaxy framework, Label-free quantification, Proteomics, Workflows",

author = "Subina Mehta and Easterly, {Caleb W.} and Ray Sajulga and Millikin, {Robert J.} and Andrea Argentini and Ignacio Eguinoa and Lennart Martens and Shortreed, {Michael R.} and Smith, {Lloyd M.} and Thomas McGowan and Praveen Kumar and Johnson, {James E.} and Griffin, {Timothy J.} and Jagtap, {Pratik D.}",

note = "Funding Information: Acknowledgments: We would like to thank the European Galaxy team for the help in the support during Galaxy implementation. We would also like to thank Carlo Horro (from Barnes Group, University of Bergen, Norway) and Bj{\"o}rn A. Gr{\"u}ning (University of Freiburg, Germany) for helping us during the quantification tools analysis. We thank Emma Leith for proofreading the manuscript. We acknowledge funding for this work from the grant We also acknowledge the support from the Minnesota Supercomputing Institute for the maintenance and update of the Galaxy instances. Data Availability: All the data files used for this study are uploaded onto a Zenodo repository at https://doi.org/10.5281/zenodo.3733904. We have also provided the input and output files of our data analysis. Supplementary Document 2 (https://github.com/galaxyproteomics/quant-tools-analysis) is the GitHub repository of the Rscripts. The original dataset for UPS study is available via ProteomeXchange identifier-PXD000279 (spiked-in Universal Proteomic Standard). Funding Information: Funding: This research was funded by National Cancer Institute-Informatics Technology for Cancer Research (NCI-ITCR) grant 1U24CA199347 and National Science Foundation (U.S.) grant 1458524 to T.G. We would also like to acknowledge the Extreme Science and Engineering Discovery Environment (XSEDE) research allocation BIO170096 to P.D.J. and use of the Jetstream cloud-based computing resource for scientific computing (https://jetstream-cloud.org/) maintained at Indiana University. The European Galaxy server that was used for some calculations is in part funded by Collaborative Research Centre 992 Medical Epigenetics (DFG grant SFB 992/1 2012) and German Federal Ministry of Education and Research (BMBF grants 031 A538A/A538C RBC, 031L0101B/031L0101C de.NBI-epi, 031L0106 de.STAIR (de.NBI)). Part of the work was performed by the Belgian ELIXIR node, also hosting the tools at the Belgian Galaxy instance, which is funded by the Research Foundation, Flanders (FWO) grant I002919N. Funding Information: This research was funded by National Cancer Institute-Informatics Technology for Cancer Research (NCI-ITCR) grant 1U24CA199347 and National Science Foundation (U.S.) grant 1458524 to T.G. We would also like to acknowledge the Extreme Science and Engineering Discovery Environment (XSEDE) research allocation BIO170096 to P.D.J. and use of the Jetstream cloud-based computing resource for scientific computing (https://jetstream-cloud.org/) maintained at Indiana University. The European Galaxy server that was used for some calculations is in part funded by Collaborative Research Centre 992 Medical Epigenetics (DFG grant SFB 992/1 2012) and German Federal Ministry of Education and Research (BMBF grants 031 A538A/A538C RBC, 031L0101B/031L0101C de.NBI-epi, 031L0106 de.STAIR (de.NBI). Part of the work was performed by the Belgian ELIXIR node, also hosting the tools at the Belgian Galaxy instance, which is funded by the Research Foundation, Flanders (FWO) grant I002919N. We would like to thank the European Galaxy team for the help in the support during Galaxy implementation. We would also like to thank Carlo Horro (from Barnes Group, University of Bergen, Norway) and Bj?rn A. Gr?ning (University of Freiburg, Germany) for helping us during the quantification tools analysis. We thank Emma Leith for proofreading the manuscript. We acknowledge funding for this work from the grantWe also acknowledge the support from the Minnesota Supercomputing Institute for the maintenance and update of the Galaxy instances. Data Availability: All the data files used for this study are uploaded onto a Zenodo repository at https://doi.org/10.5281/zenodo.3733904. We have also provided the input and output files of our data analysis. Supplementary Document 2 (https://github.com/galaxyproteomics/quant-tools-analysis) is the GitHub repository of the Rscripts. The original dataset for UPS study is available via ProteomeXchange identifier-PXD000279 (spiked-in Universal Proteomic Standard). Publisher Copyright: {\textcopyright} 2020 by the authors.",

year = "2020",

month = sep,

doi = "10.3390/PROTEOMES8030015",

language = "English (US)",

volume = "8",

journal = "Proteomes",

issn = "2227-7382",

publisher = "MDPI AG",

number = "8",

}

TY - JOUR

T1 - Precursor intensity-based label-free quantification software tools for proteomic and multi-omic analysis within the galaxy platform

AU - Mehta, Subina

AU - Easterly, Caleb W.

AU - Sajulga, Ray

AU - Millikin, Robert J.

AU - Argentini, Andrea

AU - Eguinoa, Ignacio

AU - Martens, Lennart

AU - Shortreed, Michael R.

AU - Smith, Lloyd M.

AU - McGowan, Thomas

AU - Kumar, Praveen

AU - Johnson, James E.

AU - Griffin, Timothy J.

AU - Jagtap, Pratik D.

N1 - Funding Information: Acknowledgments: We would like to thank the European Galaxy team for the help in the support during Galaxy implementation. We would also like to thank Carlo Horro (from Barnes Group, University of Bergen, Norway) and Björn A. Grüning (University of Freiburg, Germany) for helping us during the quantification tools analysis. We thank Emma Leith for proofreading the manuscript. We acknowledge funding for this work from the grant We also acknowledge the support from the Minnesota Supercomputing Institute for the maintenance and update of the Galaxy instances. Data Availability: All the data files used for this study are uploaded onto a Zenodo repository at https://doi.org/10.5281/zenodo.3733904. We have also provided the input and output files of our data analysis. Supplementary Document 2 (https://github.com/galaxyproteomics/quant-tools-analysis) is the GitHub repository of the Rscripts. The original dataset for UPS study is available via ProteomeXchange identifier-PXD000279 (spiked-in Universal Proteomic Standard). Funding Information: Funding: This research was funded by National Cancer Institute-Informatics Technology for Cancer Research (NCI-ITCR) grant 1U24CA199347 and National Science Foundation (U.S.) grant 1458524 to T.G. We would also like to acknowledge the Extreme Science and Engineering Discovery Environment (XSEDE) research allocation BIO170096 to P.D.J. and use of the Jetstream cloud-based computing resource for scientific computing (https://jetstream-cloud.org/) maintained at Indiana University. The European Galaxy server that was used for some calculations is in part funded by Collaborative Research Centre 992 Medical Epigenetics (DFG grant SFB 992/1 2012) and German Federal Ministry of Education and Research (BMBF grants 031 A538A/A538C RBC, 031L0101B/031L0101C de.NBI-epi, 031L0106 de.STAIR (de.NBI)). Part of the work was performed by the Belgian ELIXIR node, also hosting the tools at the Belgian Galaxy instance, which is funded by the Research Foundation, Flanders (FWO) grant I002919N. Funding Information: This research was funded by National Cancer Institute-Informatics Technology for Cancer Research (NCI-ITCR) grant 1U24CA199347 and National Science Foundation (U.S.) grant 1458524 to T.G. We would also like to acknowledge the Extreme Science and Engineering Discovery Environment (XSEDE) research allocation BIO170096 to P.D.J. and use of the Jetstream cloud-based computing resource for scientific computing (https://jetstream-cloud.org/) maintained at Indiana University. The European Galaxy server that was used for some calculations is in part funded by Collaborative Research Centre 992 Medical Epigenetics (DFG grant SFB 992/1 2012) and German Federal Ministry of Education and Research (BMBF grants 031 A538A/A538C RBC, 031L0101B/031L0101C de.NBI-epi, 031L0106 de.STAIR (de.NBI). Part of the work was performed by the Belgian ELIXIR node, also hosting the tools at the Belgian Galaxy instance, which is funded by the Research Foundation, Flanders (FWO) grant I002919N. We would like to thank the European Galaxy team for the help in the support during Galaxy implementation. We would also like to thank Carlo Horro (from Barnes Group, University of Bergen, Norway) and Bj?rn A. Gr?ning (University of Freiburg, Germany) for helping us during the quantification tools analysis. We thank Emma Leith for proofreading the manuscript. We acknowledge funding for this work from the grantWe also acknowledge the support from the Minnesota Supercomputing Institute for the maintenance and update of the Galaxy instances. Data Availability: All the data files used for this study are uploaded onto a Zenodo repository at https://doi.org/10.5281/zenodo.3733904. We have also provided the input and output files of our data analysis. Supplementary Document 2 (https://github.com/galaxyproteomics/quant-tools-analysis) is the GitHub repository of the Rscripts. The original dataset for UPS study is available via ProteomeXchange identifier-PXD000279 (spiked-in Universal Proteomic Standard). Publisher Copyright: © 2020 by the authors.

PY - 2020/9

Y1 - 2020/9

N2 - For mass spectrometry-based peptide and protein quantification, label-free quantification (LFQ) based on precursor mass peak (MS1) intensities is considered reliable due to its dynamic range, reproducibility, and accuracy. LFQ enables peptide-level quantitation, which is useful in proteomics (analyzing peptides carrying post-translational modifications) and multi-omics studies such as metaproteomics (analyzing taxon-specific microbial peptides) and proteogenomics (analyzing non-canonical sequences). Bioinformatics workflows accessible via the Galaxy platform have proven useful for analysis of such complex multi-omic studies. However, workflows within the Galaxy platform have lacked well-tested LFQ tools. In this study, we have evaluated moFF and FlashLFQ, two open-source LFQ tools, and implemented them within the Galaxy platform to offer access and use via established workflows. Through rigorous testing and communication with the tool developers, we have optimized the performance of each tool. Software features evaluated include: (a) match-between-runs (MBR); (b) using multiple file-formats as input for improved quantification; (c) use of containers and/or conda packages; (d) parameters needed for analyzing large datasets; and (e) optimization and validation of software performance. This work establishes a process for software implementation, optimization, and validation, and offers access to two robust software tools for LFQ-based analysis within the Galaxy platform.

AB - For mass spectrometry-based peptide and protein quantification, label-free quantification (LFQ) based on precursor mass peak (MS1) intensities is considered reliable due to its dynamic range, reproducibility, and accuracy. LFQ enables peptide-level quantitation, which is useful in proteomics (analyzing peptides carrying post-translational modifications) and multi-omics studies such as metaproteomics (analyzing taxon-specific microbial peptides) and proteogenomics (analyzing non-canonical sequences). Bioinformatics workflows accessible via the Galaxy platform have proven useful for analysis of such complex multi-omic studies. However, workflows within the Galaxy platform have lacked well-tested LFQ tools. In this study, we have evaluated moFF and FlashLFQ, two open-source LFQ tools, and implemented them within the Galaxy platform to offer access and use via established workflows. Through rigorous testing and communication with the tool developers, we have optimized the performance of each tool. Software features evaluated include: (a) match-between-runs (MBR); (b) using multiple file-formats as input for improved quantification; (c) use of containers and/or conda packages; (d) parameters needed for analyzing large datasets; and (e) optimization and validation of software performance. This work establishes a process for software implementation, optimization, and validation, and offers access to two robust software tools for LFQ-based analysis within the Galaxy platform.

KW - Galaxy framework

KW - Label-free quantification

KW - Proteomics

KW - Workflows

UR - http://www.scopus.com/inward/record.url?scp=85089276407&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85089276407&partnerID=8YFLogxK

U2 - 10.3390/PROTEOMES8030015

DO - 10.3390/PROTEOMES8030015

M3 - Article

AN - SCOPUS:85089276407

SN - 2227-7382

VL - 8

JO - Proteomes

JF - Proteomes

IS - 8

M1 - 15

ER -

Precursor intensity-based label-free quantification software tools for proteomic and multi-omic analysis within the galaxy platform

Abstract

Bibliographical note

Keywords

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this