For mass spectrometry-based peptide and protein quantification, label-free quantification (LFQ) based on precursor mass peak (MS1) intensities is considered reliable due to its dynamic range, reproducibility, and accuracy. LFQ enables peptide-level quantitation, which is useful in proteomics (analyzing peptides carrying post-translational modifications) and multi-omics studies such as metaproteomics (analyzing taxon-specific microbial peptides) and proteogenomics (analyzing non-canonical sequences). Bioinformatics workflows accessible via the Galaxy platform have proven useful for analysis of such complex multi-omic studies. However, workflows within the Galaxy platform have lacked well-tested LFQ tools. In this study, we have evaluated moFF and FlashLFQ, two open-source LFQ tools, and implemented them within the Galaxy platform to offer access and use via established workflows. Through rigorous testing and communication with the tool developers, we have optimized the performance of each tool. Software features evaluated include: (a) match-between-runs (MBR); (b) using multiple file-formats as input for improved quantification; (c) use of containers and/or conda packages; (d) parameters needed for analyzing large datasets; and (e) optimization and validation of software performance. This work establishes a process for software implementation, optimization, and validation, and offers access to two robust software tools for LFQ-based analysis within the Galaxy platform.
Bibliographical noteFunding Information:
Funding: This research was funded by National Cancer Institute-Informatics Technology for Cancer Research (NCI-ITCR) grant 1U24CA199347 and National Science Foundation (U.S.) grant 1458524 to T.G. We would also like to acknowledge the Extreme Science and Engineering Discovery Environment (XSEDE) research allocation BIO170096 to P.D.J. and use of the Jetstream cloud-based computing resource for scientific computing (https://jetstream-cloud.org/) maintained at Indiana University. The European Galaxy server that was used for some calculations is in part funded by Collaborative Research Centre 992 Medical Epigenetics (DFG grant SFB 992/1 2012) and German Federal Ministry of Education and Research (BMBF grants 031 A538A/A538C RBC, 031L0101B/031L0101C de.NBI-epi, 031L0106 de.STAIR (de.NBI)). Part of the work was performed by the Belgian ELIXIR node, also hosting the tools at the Belgian Galaxy instance, which is funded by the Research Foundation, Flanders (FWO) grant I002919N.
Acknowledgments: We would like to thank the European Galaxy team for the help in the support during Galaxy implementation. We would also like to thank Carlo Horro (from Barnes Group, University of Bergen, Norway) and Björn A. Grüning (University of Freiburg, Germany) for helping us during the quantification tools analysis. We thank Emma Leith for proofreading the manuscript. We acknowledge funding for this work from the grant We also acknowledge the support from the Minnesota Supercomputing Institute for the maintenance and update of the Galaxy instances. Data Availability: All the data files used for this study are uploaded onto a Zenodo repository at https://doi.org/10.5281/zenodo.3733904. We have also provided the input and output files of our data analysis. Supplementary Document 2 (https://github.com/galaxyproteomics/quant-tools-analysis) is the GitHub repository of the Rscripts. The original dataset for UPS study is available via ProteomeXchange identifier-PXD000279 (spiked-in Universal Proteomic Standard).
- Galaxy framework
- Label-free quantification