Causal clustering for 1-factor measurement models

Erich Kummerfeld; Joseph Ramsey

doi:10.1145/2939672.2939838

Causal clustering for 1-factor measurement models

Erich Kummerfeld, Joseph Ramsey

Institute for Health Informatics

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

34 Scopus citations

Abstract

Many scientific research programs aim to learn the causal structure of real world phenomena. This learning problem is made more difficult when the target of study cannot be directly observed. One strategy commonly used by social scientists is to create measurable "indicator" variables that covary with the latent variables of interest. Before leveraging the indicator variables to learn about the latent variables, however, one needs a measurement model of the causal relations between the indicators and their corresponding latents. These measurement models are a special class of Bayesian networks. This paper addresses the problem of reliably inferring measurement models from measured indicators, without prior knowledge of the causal relations or the number of latent variables. We present a provably correct novel algorithm, FindOneFactorClusters (FOFC), for solving this inference problem. Compared to other state of the art algorithms, FOFC is faster, scales to larger sets of indicators, and is more reliable at small sample sizes. We also present the first correctness proofs for this problem that do not assume linearity or acyclicity among the latent variables.

Original language	English (US)
Title of host publication	KDD 2016 - Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Publisher	Association for Computing Machinery
Pages	1655-1664
Number of pages	10
ISBN (Electronic)	9781450342322
DOIs	https://doi.org/10.1145/2939672.2939838
State	Published - Aug 13 2016
Event	22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016 - San Francisco, United States Duration: Aug 13 2016 → Aug 17 2016

Publication series

Name	Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Volume	13-17-August-2016

Other

Other	22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016
Country/Territory	United States
City	San Francisco
Period	8/13/16 → 8/17/16

Bibliographical note

Funding Information:
Research reported in this publication was supported by grant U54HG008540 awarded by the National Human Genome Research Institute through funds provided by the trans-NIH Big Data to Knowledge (BD2K) initiative. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. Research reported in this publication was also supported by grant 1317428 awarded by NSF

Publisher Copyright:
© 2016 ACM.

Access

10.1145/2939672.2939838

OpenUrl availability

Full text

Cite this

Kummerfeld, E., & Ramsey, J. (2016). Causal clustering for 1-factor measurement models. In KDD 2016 - Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1655-1664). (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; Vol. 13-17-August-2016). Association for Computing Machinery. https://doi.org/10.1145/2939672.2939838

Causal clustering for 1-factor measurement models. / Kummerfeld, Erich; Ramsey, Joseph.
KDD 2016 - Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, 2016. p. 1655-1664 (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; Vol. 13-17-August-2016).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Kummerfeld, E & Ramsey, J 2016, Causal clustering for 1-factor measurement models. in KDD 2016 - Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, vol. 13-17-August-2016, Association for Computing Machinery, pp. 1655-1664, 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016, San Francisco, United States, 8/13/16. https://doi.org/10.1145/2939672.2939838

Kummerfeld E, Ramsey J. Causal clustering for 1-factor measurement models. In KDD 2016 - Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery. 2016. p. 1655-1664. (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining). doi: 10.1145/2939672.2939838

@inproceedings{462c488d7d284b6d849d3e545a061a1d,

title = "Causal clustering for 1-factor measurement models",

abstract = "Many scientific research programs aim to learn the causal structure of real world phenomena. This learning problem is made more difficult when the target of study cannot be directly observed. One strategy commonly used by social scientists is to create measurable {"}indicator{"} variables that covary with the latent variables of interest. Before leveraging the indicator variables to learn about the latent variables, however, one needs a measurement model of the causal relations between the indicators and their corresponding latents. These measurement models are a special class of Bayesian networks. This paper addresses the problem of reliably inferring measurement models from measured indicators, without prior knowledge of the causal relations or the number of latent variables. We present a provably correct novel algorithm, FindOneFactorClusters (FOFC), for solving this inference problem. Compared to other state of the art algorithms, FOFC is faster, scales to larger sets of indicators, and is more reliable at small sample sizes. We also present the first correctness proofs for this problem that do not assume linearity or acyclicity among the latent variables.",

author = "Erich Kummerfeld and Joseph Ramsey",

note = "Funding Information: Research reported in this publication was supported by grant U54HG008540 awarded by the National Human Genome Research Institute through funds provided by the trans-NIH Big Data to Knowledge (BD2K) initiative. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. Research reported in this publication was also supported by grant 1317428 awarded by NSF Publisher Copyright: {\textcopyright} 2016 ACM.; 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016 ; Conference date: 13-08-2016 Through 17-08-2016",

year = "2016",

month = aug,

day = "13",

doi = "10.1145/2939672.2939838",

language = "English (US)",

series = "Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining",

publisher = "Association for Computing Machinery",

pages = "1655--1664",

booktitle = "KDD 2016 - Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining",

}

TY - GEN

T1 - Causal clustering for 1-factor measurement models

AU - Kummerfeld, Erich

AU - Ramsey, Joseph

N1 - Funding Information: Research reported in this publication was supported by grant U54HG008540 awarded by the National Human Genome Research Institute through funds provided by the trans-NIH Big Data to Knowledge (BD2K) initiative. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. Research reported in this publication was also supported by grant 1317428 awarded by NSF Publisher Copyright: © 2016 ACM.

PY - 2016/8/13

Y1 - 2016/8/13

N2 - Many scientific research programs aim to learn the causal structure of real world phenomena. This learning problem is made more difficult when the target of study cannot be directly observed. One strategy commonly used by social scientists is to create measurable "indicator" variables that covary with the latent variables of interest. Before leveraging the indicator variables to learn about the latent variables, however, one needs a measurement model of the causal relations between the indicators and their corresponding latents. These measurement models are a special class of Bayesian networks. This paper addresses the problem of reliably inferring measurement models from measured indicators, without prior knowledge of the causal relations or the number of latent variables. We present a provably correct novel algorithm, FindOneFactorClusters (FOFC), for solving this inference problem. Compared to other state of the art algorithms, FOFC is faster, scales to larger sets of indicators, and is more reliable at small sample sizes. We also present the first correctness proofs for this problem that do not assume linearity or acyclicity among the latent variables.

AB - Many scientific research programs aim to learn the causal structure of real world phenomena. This learning problem is made more difficult when the target of study cannot be directly observed. One strategy commonly used by social scientists is to create measurable "indicator" variables that covary with the latent variables of interest. Before leveraging the indicator variables to learn about the latent variables, however, one needs a measurement model of the causal relations between the indicators and their corresponding latents. These measurement models are a special class of Bayesian networks. This paper addresses the problem of reliably inferring measurement models from measured indicators, without prior knowledge of the causal relations or the number of latent variables. We present a provably correct novel algorithm, FindOneFactorClusters (FOFC), for solving this inference problem. Compared to other state of the art algorithms, FOFC is faster, scales to larger sets of indicators, and is more reliable at small sample sizes. We also present the first correctness proofs for this problem that do not assume linearity or acyclicity among the latent variables.

UR - http://www.scopus.com/inward/record.url?scp=84984972104&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84984972104&partnerID=8YFLogxK

U2 - 10.1145/2939672.2939838

DO - 10.1145/2939672.2939838

M3 - Conference contribution

C2 - 27766182

AN - SCOPUS:84984972104

T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

SP - 1655

EP - 1664

BT - KDD 2016 - Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

PB - Association for Computing Machinery

T2 - 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016

Y2 - 13 August 2016 through 17 August 2016

ER -

Causal clustering for 1-factor measurement models

Abstract

Publication series

Other

Bibliographical note

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this