Abstract
Many scientific research programs aim to learn the causal structure of real world phenomena. This learning problem is made more difficult when the target of study cannot be directly observed. One strategy commonly used by social scientists is to create measurable "indicator" variables that covary with the latent variables of interest. Before leveraging the indicator variables to learn about the latent variables, however, one needs a measurement model of the causal relations between the indicators and their corresponding latents. These measurement models are a special class of Bayesian networks. This paper addresses the problem of reliably inferring measurement models from measured indicators, without prior knowledge of the causal relations or the number of latent variables. We present a provably correct novel algorithm, FindOneFactorClusters (FOFC), for solving this inference problem. Compared to other state of the art algorithms, FOFC is faster, scales to larger sets of indicators, and is more reliable at small sample sizes. We also present the first correctness proofs for this problem that do not assume linearity or acyclicity among the latent variables.
Original language | English (US) |
---|---|
Title of host publication | KDD 2016 - Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining |
Publisher | Association for Computing Machinery |
Pages | 1655-1664 |
Number of pages | 10 |
ISBN (Electronic) | 9781450342322 |
DOIs | |
State | Published - Aug 13 2016 |
Event | 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016 - San Francisco, United States Duration: Aug 13 2016 → Aug 17 2016 |
Publication series
Name | Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining |
---|---|
Volume | 13-17-August-2016 |
Other
Other | 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016 |
---|---|
Country/Territory | United States |
City | San Francisco |
Period | 8/13/16 → 8/17/16 |
Bibliographical note
Funding Information:Research reported in this publication was supported by grant U54HG008540 awarded by the National Human Genome Research Institute through funds provided by the trans-NIH Big Data to Knowledge (BD2K) initiative. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. Research reported in this publication was also supported by grant 1317428 awarded by NSF
Publisher Copyright:
© 2016 ACM.