Local causal and markov blanket induction for causal discovery and feature selection for classification part I: Algorithms and empirical evaluation

Constantin F. Aliferis; Alexander Statnikov; Ioannis Tsamardinos; Subramani Mani; Xenofon D. Koutsoukos

Local causal and markov blanket induction for causal discovery and feature selection for classification part I: Algorithms and empirical evaluation

Constantin F. Aliferis, Alexander Statnikov, Ioannis Tsamardinos, Subramani Mani, Xenofon D. Koutsoukos

Institute for Health Informatics

Research output: Contribution to journal › Article › peer-review

411 Scopus citations

Abstract

We present an algorithmic framework for learning local causal structure around target variables of interest in the form of direct causes/effects and Markov blankets applicable to very large data sets with relatively small samples. The selected feature sets can be used for causal discovery and classification. The framework (Generalized Local Learning, or GLL) can be instantiated in numerous ways, giving rise to both existing state-of-the-art as well as novel algorithms. The resulting algorithms are sound under well-defined sufficient conditions. In a first set of experiments we evaluate several algorithms derived from this framework in terms of predictivity and feature set parsimony and compare to other local causal discovery methods and to state-of-the-art non-causal feature selection methods using real data. A second set of experimental evaluations compares the algorithms in terms of ability to induce local causal neighborhoods using simulated and resimulated data and examines the relation of predictivity with causal induction performance. Our experiments demonstrate, consistently with causal feature selection theory, that local causal feature selection methods (under broad assumptions encompassing appropriate family of distributions, types of classifiers, and loss functions) exhibit strong feature set parsimony, high predictivity and local causal interpretability. Although non-causal feature selection methods are often used in practice to shed light on causal relationships, we find that they cannot be interpreted causally even when they achieve excellent predictivity. Therefore we conclude that only local causal techniques should be used when insight into causal structure is sought. In a companion paper we examine in depth the behavior of GLL algorithms, provide extensions, and show how local techniques can be used for scalable and accurate global causal graph learning.

Original language	English (US)
Pages (from-to)	171-234
Number of pages	64
Journal	Journal of Machine Learning Research
Volume	11
State	Published - 2010

Keywords

Causal structure learning
Classification
Feature selection
Learning of Bayesian networks
Local causal discovery
Markov blanket induction

OpenUrl availability

Full text

Cite this

@article{fe7399658acb4060ac0f7c202c92946d,

title = "Local causal and markov blanket induction for causal discovery and feature selection for classification part I: Algorithms and empirical evaluation",

abstract = "We present an algorithmic framework for learning local causal structure around target variables of interest in the form of direct causes/effects and Markov blankets applicable to very large data sets with relatively small samples. The selected feature sets can be used for causal discovery and classification. The framework (Generalized Local Learning, or GLL) can be instantiated in numerous ways, giving rise to both existing state-of-the-art as well as novel algorithms. The resulting algorithms are sound under well-defined sufficient conditions. In a first set of experiments we evaluate several algorithms derived from this framework in terms of predictivity and feature set parsimony and compare to other local causal discovery methods and to state-of-the-art non-causal feature selection methods using real data. A second set of experimental evaluations compares the algorithms in terms of ability to induce local causal neighborhoods using simulated and resimulated data and examines the relation of predictivity with causal induction performance. Our experiments demonstrate, consistently with causal feature selection theory, that local causal feature selection methods (under broad assumptions encompassing appropriate family of distributions, types of classifiers, and loss functions) exhibit strong feature set parsimony, high predictivity and local causal interpretability. Although non-causal feature selection methods are often used in practice to shed light on causal relationships, we find that they cannot be interpreted causally even when they achieve excellent predictivity. Therefore we conclude that only local causal techniques should be used when insight into causal structure is sought. In a companion paper we examine in depth the behavior of GLL algorithms, provide extensions, and show how local techniques can be used for scalable and accurate global causal graph learning.",

keywords = "Causal structure learning, Classification, Feature selection, Learning of Bayesian networks, Local causal discovery, Markov blanket induction",

author = "Aliferis, {Constantin F.} and Alexander Statnikov and Ioannis Tsamardinos and Subramani Mani and Koutsoukos, {Xenofon D.}",

year = "2010",

language = "English (US)",

volume = "11",

pages = "171--234",

journal = "Journal of Machine Learning Research",

issn = "1532-4435",

publisher = "Microtome Publishing",

}

TY - JOUR

T1 - Local causal and markov blanket induction for causal discovery and feature selection for classification part I

T2 - Algorithms and empirical evaluation

AU - Aliferis, Constantin F.

AU - Statnikov, Alexander

AU - Tsamardinos, Ioannis

AU - Mani, Subramani

AU - Koutsoukos, Xenofon D.

PY - 2010

Y1 - 2010

N2 - We present an algorithmic framework for learning local causal structure around target variables of interest in the form of direct causes/effects and Markov blankets applicable to very large data sets with relatively small samples. The selected feature sets can be used for causal discovery and classification. The framework (Generalized Local Learning, or GLL) can be instantiated in numerous ways, giving rise to both existing state-of-the-art as well as novel algorithms. The resulting algorithms are sound under well-defined sufficient conditions. In a first set of experiments we evaluate several algorithms derived from this framework in terms of predictivity and feature set parsimony and compare to other local causal discovery methods and to state-of-the-art non-causal feature selection methods using real data. A second set of experimental evaluations compares the algorithms in terms of ability to induce local causal neighborhoods using simulated and resimulated data and examines the relation of predictivity with causal induction performance. Our experiments demonstrate, consistently with causal feature selection theory, that local causal feature selection methods (under broad assumptions encompassing appropriate family of distributions, types of classifiers, and loss functions) exhibit strong feature set parsimony, high predictivity and local causal interpretability. Although non-causal feature selection methods are often used in practice to shed light on causal relationships, we find that they cannot be interpreted causally even when they achieve excellent predictivity. Therefore we conclude that only local causal techniques should be used when insight into causal structure is sought. In a companion paper we examine in depth the behavior of GLL algorithms, provide extensions, and show how local techniques can be used for scalable and accurate global causal graph learning.

AB - We present an algorithmic framework for learning local causal structure around target variables of interest in the form of direct causes/effects and Markov blankets applicable to very large data sets with relatively small samples. The selected feature sets can be used for causal discovery and classification. The framework (Generalized Local Learning, or GLL) can be instantiated in numerous ways, giving rise to both existing state-of-the-art as well as novel algorithms. The resulting algorithms are sound under well-defined sufficient conditions. In a first set of experiments we evaluate several algorithms derived from this framework in terms of predictivity and feature set parsimony and compare to other local causal discovery methods and to state-of-the-art non-causal feature selection methods using real data. A second set of experimental evaluations compares the algorithms in terms of ability to induce local causal neighborhoods using simulated and resimulated data and examines the relation of predictivity with causal induction performance. Our experiments demonstrate, consistently with causal feature selection theory, that local causal feature selection methods (under broad assumptions encompassing appropriate family of distributions, types of classifiers, and loss functions) exhibit strong feature set parsimony, high predictivity and local causal interpretability. Although non-causal feature selection methods are often used in practice to shed light on causal relationships, we find that they cannot be interpreted causally even when they achieve excellent predictivity. Therefore we conclude that only local causal techniques should be used when insight into causal structure is sought. In a companion paper we examine in depth the behavior of GLL algorithms, provide extensions, and show how local techniques can be used for scalable and accurate global causal graph learning.

KW - Causal structure learning

KW - Classification

KW - Feature selection

KW - Learning of Bayesian networks

KW - Local causal discovery

KW - Markov blanket induction

UR - http://www.scopus.com/inward/record.url?scp=76749137632&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=76749137632&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:76749137632

SN - 1532-4435

VL - 11

SP - 171

EP - 234

JO - Journal of Machine Learning Research

JF - Journal of Machine Learning Research

ER -

Local causal and markov blanket induction for causal discovery and feature selection for classification part I: Algorithms and empirical evaluation

Abstract

Keywords

OpenUrl availability

Other files and links

Fingerprint

Cite this