Model Selection Techniques: An Overview

Jie Ding; Vahid Tarokh; Yuhong Yang

doi:10.1109/MSP.2018.2867638

Model Selection Techniques: An Overview

Jie Ding, Vahid Tarokh, Yuhong Yang

Statistics (Twin Cities)

Research output: Contribution to journal › Article › peer-review

173 Scopus citations

Abstract

In the era of big data, analysts usually explore various statistical models or machine-learning methods for observed data to facilitate scientific discoveries or gain predictive power. Whatever data and fitting procedures are employed, a crucial step is to select the most appropriate model or method from a set of candidates. Model selection is a key ingredient in data analysis for reliable and reproducible statistical inference or prediction, and thus it is central to scientific studies in such fields as ecology, economics, engineering, finance, political science, biology, and epidemiology. There has been a long history of model selection techniques that arise from researches in statistics, information theory, and signal processing. A considerable number of methods has been proposed, following different philosophies and exhibiting varying performances. The purpose of this article is to provide a comprehensive overview of them, in terms of their motivation, large sample performance, and applicability. We provide integrated and practically relevant discussions on theoretical properties of state-of-the-art model selection approaches. We also share our thoughts on some controversial views on the practice of model selection.

Original language	English (US)
Article number	8498082
Pages (from-to)	16-34
Number of pages	19
Journal	IEEE Signal Processing Magazine
Volume	35
Issue number	6
DOIs	https://doi.org/10.1109/MSP.2018.2867638
State	Published - Nov 2018

Bibliographical note

Funding Information:
This research was funded in part by the Defense Advanced Research Projects Agency under grant W911NF-18-1-0134. We thank Dr. Shuguang Cui and eight anonymous reviewers for giving feedback on the initial submission of the manuscript. We are also grateful to Dr. Matthew McKay and Dr. Osvaldo Simeone for handling the full submission of the manuscript, and to three anonymous reviewers for their comprehensive comments that have greatly improved the article.

Publisher Copyright:
© 2018 IEEE.

Access

10.1109/MSP.2018.2867638

OpenUrl availability

Full text

Cite this

@article{432dacdeef234175b9dd24ac41fb3dad,

title = "Model Selection Techniques: An Overview",

abstract = "In the era of big data, analysts usually explore various statistical models or machine-learning methods for observed data to facilitate scientific discoveries or gain predictive power. Whatever data and fitting procedures are employed, a crucial step is to select the most appropriate model or method from a set of candidates. Model selection is a key ingredient in data analysis for reliable and reproducible statistical inference or prediction, and thus it is central to scientific studies in such fields as ecology, economics, engineering, finance, political science, biology, and epidemiology. There has been a long history of model selection techniques that arise from researches in statistics, information theory, and signal processing. A considerable number of methods has been proposed, following different philosophies and exhibiting varying performances. The purpose of this article is to provide a comprehensive overview of them, in terms of their motivation, large sample performance, and applicability. We provide integrated and practically relevant discussions on theoretical properties of state-of-the-art model selection approaches. We also share our thoughts on some controversial views on the practice of model selection.",

author = "Jie Ding and Vahid Tarokh and Yuhong Yang",

note = "Funding Information: This research was funded in part by the Defense Advanced Research Projects Agency under grant W911NF-18-1-0134. We thank Dr. Shuguang Cui and eight anonymous reviewers for giving feedback on the initial submission of the manuscript. We are also grateful to Dr. Matthew McKay and Dr. Osvaldo Simeone for handling the full submission of the manuscript, and to three anonymous reviewers for their comprehensive comments that have greatly improved the article. Publisher Copyright: {\textcopyright} 2018 IEEE.",

year = "2018",

month = nov,

doi = "10.1109/MSP.2018.2867638",

language = "English (US)",

volume = "35",

pages = "16--34",

journal = "IEEE Signal Processing Magazine",

issn = "1053-5888",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "6",

}

TY - JOUR

T1 - Model Selection Techniques

T2 - An Overview

AU - Ding, Jie

AU - Tarokh, Vahid

AU - Yang, Yuhong

N1 - Funding Information: This research was funded in part by the Defense Advanced Research Projects Agency under grant W911NF-18-1-0134. We thank Dr. Shuguang Cui and eight anonymous reviewers for giving feedback on the initial submission of the manuscript. We are also grateful to Dr. Matthew McKay and Dr. Osvaldo Simeone for handling the full submission of the manuscript, and to three anonymous reviewers for their comprehensive comments that have greatly improved the article. Publisher Copyright: © 2018 IEEE.

PY - 2018/11

Y1 - 2018/11

N2 - In the era of big data, analysts usually explore various statistical models or machine-learning methods for observed data to facilitate scientific discoveries or gain predictive power. Whatever data and fitting procedures are employed, a crucial step is to select the most appropriate model or method from a set of candidates. Model selection is a key ingredient in data analysis for reliable and reproducible statistical inference or prediction, and thus it is central to scientific studies in such fields as ecology, economics, engineering, finance, political science, biology, and epidemiology. There has been a long history of model selection techniques that arise from researches in statistics, information theory, and signal processing. A considerable number of methods has been proposed, following different philosophies and exhibiting varying performances. The purpose of this article is to provide a comprehensive overview of them, in terms of their motivation, large sample performance, and applicability. We provide integrated and practically relevant discussions on theoretical properties of state-of-the-art model selection approaches. We also share our thoughts on some controversial views on the practice of model selection.

AB - In the era of big data, analysts usually explore various statistical models or machine-learning methods for observed data to facilitate scientific discoveries or gain predictive power. Whatever data and fitting procedures are employed, a crucial step is to select the most appropriate model or method from a set of candidates. Model selection is a key ingredient in data analysis for reliable and reproducible statistical inference or prediction, and thus it is central to scientific studies in such fields as ecology, economics, engineering, finance, political science, biology, and epidemiology. There has been a long history of model selection techniques that arise from researches in statistics, information theory, and signal processing. A considerable number of methods has been proposed, following different philosophies and exhibiting varying performances. The purpose of this article is to provide a comprehensive overview of them, in terms of their motivation, large sample performance, and applicability. We provide integrated and practically relevant discussions on theoretical properties of state-of-the-art model selection approaches. We also share our thoughts on some controversial views on the practice of model selection.

UR - http://www.scopus.com/inward/record.url?scp=85056783479&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85056783479&partnerID=8YFLogxK

U2 - 10.1109/MSP.2018.2867638

DO - 10.1109/MSP.2018.2867638

M3 - Article

AN - SCOPUS:85056783479

SN - 1053-5888

VL - 35

SP - 16

EP - 34

JO - IEEE Signal Processing Magazine

JF - IEEE Signal Processing Magazine

IS - 6

M1 - 8498082

ER -

Model Selection Techniques: An Overview

Abstract

Bibliographical note

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this