TY - JOUR
T1 - Challenges in the analysis of mass-throughput data
T2 - A technical commentary from the statistical machine learning perspective
AU - Aliferis, Constantin F.
AU - Statnikov, Alexander
AU - Tsamardinos, Ioannis
PY - 2006
Y1 - 2006
N2 - Sound data analysis is critical to the success of modern molecular medicine research that involves collection and interpretation of mass-throughput data. The novel nature and high-dimensionality in such datasets pose a series of nontrivial data analysis problems. This technical commentary discusses the problems of over-fitting, error estimation, curse of dimensionality, causal versus predictive modeling, integration of heterogeneous types of data, and lack of standard protocols for data analysis. We attempt to shed light on the nature and causes of these problems and to outline viable methodological approaches to overcome them.
AB - Sound data analysis is critical to the success of modern molecular medicine research that involves collection and interpretation of mass-throughput data. The novel nature and high-dimensionality in such datasets pose a series of nontrivial data analysis problems. This technical commentary discusses the problems of over-fitting, error estimation, curse of dimensionality, causal versus predictive modeling, integration of heterogeneous types of data, and lack of standard protocols for data analysis. We attempt to shed light on the nature and causes of these problems and to outline viable methodological approaches to overcome them.
UR - http://www.scopus.com/inward/record.url?scp=33748955151&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33748955151&partnerID=8YFLogxK
U2 - 10.1177/117693510600200004
DO - 10.1177/117693510600200004
M3 - Review article
C2 - 19458765
AN - SCOPUS:33748955151
SN - 1176-9351
VL - 2
SP - 133
EP - 162
JO - Cancer Informatics
JF - Cancer Informatics
ER -