TY - GEN
T1 - Sparse group selection on fused lasso components for identifying group-specific DNA copy number variations
AU - Tian, Ze
AU - Zhang, Huanan
AU - Kuang, Rui
PY - 2012/12/1
Y1 - 2012/12/1
N2 - Detecting DNA copy number variations (CNVs) from arrayCGH or genotyping-array data to correlate with cancer outcomes is crucial for understanding the molecular mechanisms underlying cancer. Previous methods either focus on detecting CNVs in each individual patient sample or common CNVs across all the patient samples. These methods ignore the discrepancies introduced by the heterogeneity in the patient samples, which implies that common CNVs might only be shared within some groups of samples instead of all samples. In this paper, we propose a latent feature model that couples sparse sample group selection with fused lasso on CNV components to identify group-specific CNVs. Assuming a given group structure on patient samples by clinical information, sparse group selection on fused lasso (SGS-FL) identifies the optimal latent CNV components, each of which is specific to the samples in one or several groups. The group selection for each CNV component is determined dynamically by an adaptive algorithm to achieve a desired sparsity. Simulation results show that SGS-FL can more accurately identify the latent CNV components when there is a reliable underlying group structure in the samples. In the experiments on arrayCGH breast cancer and bladder cancer datasets, SGS-FL detected CNV regions that are more relevant to cancer, and provided latent feature weights that can be used for better sample classification.
AB - Detecting DNA copy number variations (CNVs) from arrayCGH or genotyping-array data to correlate with cancer outcomes is crucial for understanding the molecular mechanisms underlying cancer. Previous methods either focus on detecting CNVs in each individual patient sample or common CNVs across all the patient samples. These methods ignore the discrepancies introduced by the heterogeneity in the patient samples, which implies that common CNVs might only be shared within some groups of samples instead of all samples. In this paper, we propose a latent feature model that couples sparse sample group selection with fused lasso on CNV components to identify group-specific CNVs. Assuming a given group structure on patient samples by clinical information, sparse group selection on fused lasso (SGS-FL) identifies the optimal latent CNV components, each of which is specific to the samples in one or several groups. The group selection for each CNV component is determined dynamically by an adaptive algorithm to achieve a desired sparsity. Simulation results show that SGS-FL can more accurately identify the latent CNV components when there is a reliable underlying group structure in the samples. In the experiments on arrayCGH breast cancer and bladder cancer datasets, SGS-FL detected CNV regions that are more relevant to cancer, and provided latent feature weights that can be used for better sample classification.
KW - DNA copy number variations
KW - Fused lasso
KW - Group lasso
KW - Sparse group learning
UR - http://www.scopus.com/inward/record.url?scp=84874043196&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84874043196&partnerID=8YFLogxK
U2 - 10.1109/ICDM.2012.35
DO - 10.1109/ICDM.2012.35
M3 - Conference contribution
AN - SCOPUS:84874043196
SN - 9780769549057
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 665
EP - 674
BT - Proceedings - 12th IEEE International Conference on Data Mining, ICDM 2012
T2 - 12th IEEE International Conference on Data Mining, ICDM 2012
Y2 - 10 December 2012 through 13 December 2012
ER -