Model selection confidence sets by likelihood ratio testing

Chao Zheng; Davide Ferrari; Yuhong Yang

doi:10.5705/ss.202017.0006

Model selection confidence sets by likelihood ratio testing

Chao Zheng, Davide Ferrari, Yuhong Yang

Statistics (Twin Cities)

Research output: Contribution to journal › Article › peer-review

11 Scopus citations

Abstract

The traditional activity of model selection aims at discovering a single model superior to other candidate models. In the presence of pronounced noise, however, multiple models are often found to explain the same data equally well. To resolve this model selection ambiguity, we introduce the general approach of model selection confidence sets (MSCSs) based on likelihood ratio testing. A MSCS is defined as a list of models statistically indistinguishable from the true model at a user-specified level of confidence, which extends the familiar notion of confidence intervals to the model-selection framework. Our approach guarantees asymptotically correct coverage probability of the true model when both sample size and model dimension increase. We derive conditions under which the MSCS contains all the relevant information about the true model structure. In addition, we propose natural statistics based on the MSCS to measure importance of variables in a principled way that accounts for the overall model uncertainty. When the space of feasible models is large, MSCS is implemented by an adaptive stochastic search algorithm which samples MSCS models with high probability. The MSCS methodology is illustrated through numerical experiments on synthetic and real data examples.

Original language	English (US)
Pages (from-to)	827-851
Number of pages	25
Journal	Statistica Sinica
Volume	29
Issue number	2
DOIs	https://doi.org/10.5705/ss.202017.0006
State	Published - 2019

Bibliographical note

Publisher Copyright:
© 2019 Institute of Statistical Science. All rights reserved.

Keywords

Adaptive sampling
Likelihood ratio test
Model selection confidence set
Optimal detectability condition

Access

10.5705/ss.202017.0006

OpenUrl availability

Full text

Cite this

@article{2dfeb25992a9445abd9be8350e7be132,

title = "Model selection confidence sets by likelihood ratio testing",

abstract = "The traditional activity of model selection aims at discovering a single model superior to other candidate models. In the presence of pronounced noise, however, multiple models are often found to explain the same data equally well. To resolve this model selection ambiguity, we introduce the general approach of model selection confidence sets (MSCSs) based on likelihood ratio testing. A MSCS is defined as a list of models statistically indistinguishable from the true model at a user-specified level of confidence, which extends the familiar notion of confidence intervals to the model-selection framework. Our approach guarantees asymptotically correct coverage probability of the true model when both sample size and model dimension increase. We derive conditions under which the MSCS contains all the relevant information about the true model structure. In addition, we propose natural statistics based on the MSCS to measure importance of variables in a principled way that accounts for the overall model uncertainty. When the space of feasible models is large, MSCS is implemented by an adaptive stochastic search algorithm which samples MSCS models with high probability. The MSCS methodology is illustrated through numerical experiments on synthetic and real data examples.",

keywords = "Adaptive sampling, Likelihood ratio test, Model selection confidence set, Optimal detectability condition",

author = "Chao Zheng and Davide Ferrari and Yuhong Yang",

year = "2019",

doi = "10.5705/ss.202017.0006",

language = "English (US)",

volume = "29",

pages = "827--851",

journal = "Statistica Sinica",

issn = "1017-0405",

publisher = "Institute of Statistical Science",

number = "2",

}

TY - JOUR

T1 - Model selection confidence sets by likelihood ratio testing

AU - Zheng, Chao

AU - Ferrari, Davide

AU - Yang, Yuhong

PY - 2019

Y1 - 2019

N2 - The traditional activity of model selection aims at discovering a single model superior to other candidate models. In the presence of pronounced noise, however, multiple models are often found to explain the same data equally well. To resolve this model selection ambiguity, we introduce the general approach of model selection confidence sets (MSCSs) based on likelihood ratio testing. A MSCS is defined as a list of models statistically indistinguishable from the true model at a user-specified level of confidence, which extends the familiar notion of confidence intervals to the model-selection framework. Our approach guarantees asymptotically correct coverage probability of the true model when both sample size and model dimension increase. We derive conditions under which the MSCS contains all the relevant information about the true model structure. In addition, we propose natural statistics based on the MSCS to measure importance of variables in a principled way that accounts for the overall model uncertainty. When the space of feasible models is large, MSCS is implemented by an adaptive stochastic search algorithm which samples MSCS models with high probability. The MSCS methodology is illustrated through numerical experiments on synthetic and real data examples.

AB - The traditional activity of model selection aims at discovering a single model superior to other candidate models. In the presence of pronounced noise, however, multiple models are often found to explain the same data equally well. To resolve this model selection ambiguity, we introduce the general approach of model selection confidence sets (MSCSs) based on likelihood ratio testing. A MSCS is defined as a list of models statistically indistinguishable from the true model at a user-specified level of confidence, which extends the familiar notion of confidence intervals to the model-selection framework. Our approach guarantees asymptotically correct coverage probability of the true model when both sample size and model dimension increase. We derive conditions under which the MSCS contains all the relevant information about the true model structure. In addition, we propose natural statistics based on the MSCS to measure importance of variables in a principled way that accounts for the overall model uncertainty. When the space of feasible models is large, MSCS is implemented by an adaptive stochastic search algorithm which samples MSCS models with high probability. The MSCS methodology is illustrated through numerical experiments on synthetic and real data examples.

KW - Adaptive sampling

KW - Likelihood ratio test

KW - Model selection confidence set

KW - Optimal detectability condition

UR - http://www.scopus.com/inward/record.url?scp=85062091195&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85062091195&partnerID=8YFLogxK

U2 - 10.5705/ss.202017.0006

DO - 10.5705/ss.202017.0006

M3 - Article

AN - SCOPUS:85062091195

SN - 1017-0405

VL - 29

SP - 827

EP - 851

JO - Statistica Sinica

JF - Statistica Sinica

IS - 2

ER -

Model selection confidence sets by likelihood ratio testing

Abstract

Bibliographical note

Keywords

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this