Single-cell microscopy image analysis has proved invaluable in protein subcellular localization for inferring gene/protein function. Fluorescent-tagged proteins across cellular compartments are tracked and imaged in response to genetic or environmental perturbations. With a large number of images generated by high-content microscopy while manual labeling is both labor-intensive and error-prone, machine learning offers a viable alternative for automatic labeling of subcellular localizations. Contrarily, in recent years applications of deep learning methods to large datasets in natural images and other domains have become quite successful. An appeal of deep learning methods is that they can learn salient features from complicated data with little data preprocessing. For such purposes, we applied several representative types of deep convolutional neural networks (CNNs) and two popular ensemble methods, random forests and gradient boosting, to predict protein subcellular localization with a moderately large cell image data set. We show a consistently better predictive performance of CNNs over the two ensemble methods. We also demonstrate the use of CNNs for feature extraction. In the end, we share our computer code and pretrained models to facilitate CNN's applications in genetics and computational biology.
Bibliographical noteFunding Information:
We thank the reviewers for their helpful comments and suggestions. This study was supported by NIH grants R21AG057038, R01HL116720, R01GM113250, R01HL105397 and R01GM126002, and NSF grants.
- deep learning
- feature extraction
- gradient boosting
- random forests
PubMed: MeSH publication types
- Journal Article
- Research Support, N.I.H., Extramural
- Research Support, U.S. Gov't, Non-P.H.S.