Proteomics studies based on mass spectrometry (MS) are gaining popular applications in biomedical research for protein identification/quantification and biomarker discovery, especially for potential early diagnosis and prognosis of severe disease before the occurrence of symptoms. However, MS data collected using current technologies are very noisy and appropriate data preprocessing is critical for successful applications of MS-based approaches. Among various data preprocessing steps, peak alignment from multiple spectra based on detected peak sample locations presents special statistical challenges when effective experimental calibration is not feasible due to relatively large peak location variation. To avoid intensive tuning parameter optimization, we propose a simple novel Bayesian algorithm "random grafting-pruning Markov chain Monte Carlo (RGPMCMC)" that can be applied to global MS peak alignment and to follow certain model-based sample classification criterion for using aligned peaks to classify spectrum samples. The usefulness of our approach is demonstrated through simulation study by making extensive comparison with other algorithms in the literature. Its application to an ovarian cancer MALDI-MS data set achieves a smaller 10-fold cross validation error rate than other current large scale methodologies.
- Mass spectrometry
- Peak alignment
- Random grafting-pruning Markov chain Monte Carlo (RGPCMC)
- Reversible jump Markov chain Monte Carlo (RJMCMC)
- Sample classification
- Symmetric transition