Clustering is the task of identifying groups of similar subjects according to certain criteria. The AJCC staging system can be thought as a clustering mechanism that groups patients based on their disease stage. This grouping drives prognosis and influences treatment. The goal of this work is to evaluate the efficacy of machine learning algorithms to cluster the patients into discriminative groups to improve prognosis for overall survival (OS) and relapse free survival (RFS) outcomes. We apply clustering over a retrospectively collected data from 644 head and neck cancer patients including both clinical and radiomic features. In order to incorporate outcome information into the clustering process and deal with the large proportion of censored samples, the feature space was scaled using the regression coefficients fitted using a proxy dependent variable, martingale residuals, instead of follow-up time. Two clusters were identified and evaluated using cross validation. The Kaplan Meier (KM) curves between the two clusters differ significantly for OS and RFS (p-value < 0.0001). Moreover, there was a relative predictive improvement when using the cluster label in addition to the clinical features compared to using only clinical features where AUC increased by 5.7% and 13.0% for OS and RFS, respectively.
Bibliographical noteFunding Information:
This work was partially supported by the NIH National Cancer Institute/Big Data to Knowledge (BD2K) Program under grant R01CA214825 and joint NSF/NIH Initiative on Quantitative Approaches to Biomedical Big Data (QuBDD) (R01) grant R01CA225190. This research is also supported by the Andrew Sabin Family Foundation; Dr. Fuller is a Sabin Family Foundation Fellow. Dr. Fuller receives funding and salary support from the National Institutes of Health (NIH), including: the National Institute for Dental and Craniofacial Research Award (1R01DE025248-01/R56DE025248–01); a National Science Foundation (NSF), Division of Mathematical Sciences, Joint NIH/NSF Initiative on Quantitative Approaches to Biomedical Big Data (QuBBD) Grant (NSF 1557679); the NIH Big Data to Knowledge (BD2K) Program of the National Cancer Institute (NCI) Early Stage Development of Technologies in Biomedical Computing, Informatics, and Big Data Science Award (1R01CA214825-01); NCI Early Phase Clinical Trials in Imaging and Image-Guided Interventions Program (1R01CA218148-01); an NIH/NCI Cancer Center Support Grant (CCSG) Pilot Research Program Award from the UT MD Anderson CCSG Radiation Oncology and Cancer Imaging Program (P30CA016672) and an NIH/ NCI Head and Neck Specialized Programs of Research Excellence (SPORE) Developmental Research Program Award (P50 CA097007-10). Dr. Fuller has received direct industry grant support and travel funding from Elekta AB. Dr. Mohamed is supported by NCI (5R01CA214825-02 and 5R01CA225190-02). Dr. Elhalawani is supported in part by the philanthropic donations from the Family of Paul W. Beach to Dr. G. Brandon Gunn, MD Dr. Elgohari is on an Egyptian American conjoint PhD program funded by the Egyptian Cultural and Educational bureau. Dr. Marai is supported by NSF and NIH, as well as the Feinberg Foundation. Dr. Vock is supported by NCI.