Comparing statistical approaches for selecting optimal models of stem volume in loblolly pine plantations

Brian J. Clough; Edwin J. Green

doi:10.5849/forsci.14-203

Comparing statistical approaches for selecting optimal models of stem volume in loblolly pine plantations

Brian J. Clough, Edwin J. Green

Forest Resources

Research output: Contribution to journal › Article › peer-review

Abstract

Predictive models are a key component of forest research, and although much literature has been devoted to exploring appropriate functional forms for a variety of modeling scenarios, there is far less addressing the best statistical methods for selecting among multiple candidate models. In this study, we compare model-averaged predictors developed from a common set of models, using two approaches for calculating Bayesian posterior model probabilities (exact analytical calculation using reference priors and estimation based on reversible jump Monte Carlo), the deviance information criterion (DIC) and the small sample-size-corrected form of Akaike’s information criterion (AICc). Our example involves linear variable selection to develop models for predicting outside bark volume of loblolly pine (Pinus taeda), and this comparison is accomplished via cross-validation. We found that analytical calculation of the posterior probabilities, DIC, and AICc, resulted in model sets with comparable predictive performance, whereas reversible jump was less consistent and on average provided less accurate results. In the latter case, poor mixing of the reversible jump algorithm may have contributed to biased estimates of posterior probabilities. In general, our results show that the choice of model selection criteria may lead to divergent results in the choice and weighting of candidate models, although in our case study, these discrepancies had only small effects on predictive performance. However, in other analytical scenarios, these differences may be more profound. Regardless of how model selection is carried out, predictive models should be carefully evaluated, preferably through rigorous evaluation of predictive performance.

Original language	English (US)
Pages (from-to)	9-17
Number of pages	9
Journal	Forest Science
Volume	62
Issue number	1
DOIs	https://doi.org/10.5849/forsci.14-203
State	Published - Feb 8 2016

Keywords

Akaike’s information criterion
Bayesian model selection
Cross-validation
Deviance information criterion
Prediction
Variable selection

Access

10.5849/forsci.14-203

OpenUrl availability

Full text

Cite this

@article{9a22047a7d9946f4a9484ee92f1ebca7,

title = "Comparing statistical approaches for selecting optimal models of stem volume in loblolly pine plantations",

abstract = "Predictive models are a key component of forest research, and although much literature has been devoted to exploring appropriate functional forms for a variety of modeling scenarios, there is far less addressing the best statistical methods for selecting among multiple candidate models. In this study, we compare model-averaged predictors developed from a common set of models, using two approaches for calculating Bayesian posterior model probabilities (exact analytical calculation using reference priors and estimation based on reversible jump Monte Carlo), the deviance information criterion (DIC) and the small sample-size-corrected form of Akaike{\textquoteright}s information criterion (AICc). Our example involves linear variable selection to develop models for predicting outside bark volume of loblolly pine (Pinus taeda), and this comparison is accomplished via cross-validation. We found that analytical calculation of the posterior probabilities, DIC, and AICc, resulted in model sets with comparable predictive performance, whereas reversible jump was less consistent and on average provided less accurate results. In the latter case, poor mixing of the reversible jump algorithm may have contributed to biased estimates of posterior probabilities. In general, our results show that the choice of model selection criteria may lead to divergent results in the choice and weighting of candidate models, although in our case study, these discrepancies had only small effects on predictive performance. However, in other analytical scenarios, these differences may be more profound. Regardless of how model selection is carried out, predictive models should be carefully evaluated, preferably through rigorous evaluation of predictive performance.",

keywords = "Akaike{\textquoteright}s information criterion, Bayesian model selection, Cross-validation, Deviance information criterion, Prediction, Variable selection",

author = "Clough, {Brian J.} and Green, {Edwin J.}",

year = "2016",

month = feb,

day = "8",

doi = "10.5849/forsci.14-203",

language = "English (US)",

volume = "62",

pages = "9--17",

journal = "Forest Science",

issn = "0015-749X",

publisher = "Society of American Foresters",

number = "1",

}

TY - JOUR

T1 - Comparing statistical approaches for selecting optimal models of stem volume in loblolly pine plantations

AU - Clough, Brian J.

AU - Green, Edwin J.

PY - 2016/2/8

Y1 - 2016/2/8

N2 - Predictive models are a key component of forest research, and although much literature has been devoted to exploring appropriate functional forms for a variety of modeling scenarios, there is far less addressing the best statistical methods for selecting among multiple candidate models. In this study, we compare model-averaged predictors developed from a common set of models, using two approaches for calculating Bayesian posterior model probabilities (exact analytical calculation using reference priors and estimation based on reversible jump Monte Carlo), the deviance information criterion (DIC) and the small sample-size-corrected form of Akaike’s information criterion (AICc). Our example involves linear variable selection to develop models for predicting outside bark volume of loblolly pine (Pinus taeda), and this comparison is accomplished via cross-validation. We found that analytical calculation of the posterior probabilities, DIC, and AICc, resulted in model sets with comparable predictive performance, whereas reversible jump was less consistent and on average provided less accurate results. In the latter case, poor mixing of the reversible jump algorithm may have contributed to biased estimates of posterior probabilities. In general, our results show that the choice of model selection criteria may lead to divergent results in the choice and weighting of candidate models, although in our case study, these discrepancies had only small effects on predictive performance. However, in other analytical scenarios, these differences may be more profound. Regardless of how model selection is carried out, predictive models should be carefully evaluated, preferably through rigorous evaluation of predictive performance.

AB - Predictive models are a key component of forest research, and although much literature has been devoted to exploring appropriate functional forms for a variety of modeling scenarios, there is far less addressing the best statistical methods for selecting among multiple candidate models. In this study, we compare model-averaged predictors developed from a common set of models, using two approaches for calculating Bayesian posterior model probabilities (exact analytical calculation using reference priors and estimation based on reversible jump Monte Carlo), the deviance information criterion (DIC) and the small sample-size-corrected form of Akaike’s information criterion (AICc). Our example involves linear variable selection to develop models for predicting outside bark volume of loblolly pine (Pinus taeda), and this comparison is accomplished via cross-validation. We found that analytical calculation of the posterior probabilities, DIC, and AICc, resulted in model sets with comparable predictive performance, whereas reversible jump was less consistent and on average provided less accurate results. In the latter case, poor mixing of the reversible jump algorithm may have contributed to biased estimates of posterior probabilities. In general, our results show that the choice of model selection criteria may lead to divergent results in the choice and weighting of candidate models, although in our case study, these discrepancies had only small effects on predictive performance. However, in other analytical scenarios, these differences may be more profound. Regardless of how model selection is carried out, predictive models should be carefully evaluated, preferably through rigorous evaluation of predictive performance.

KW - Akaike’s information criterion

KW - Bayesian model selection

KW - Cross-validation

KW - Deviance information criterion

KW - Prediction

KW - Variable selection

UR - http://www.scopus.com/inward/record.url?scp=84956764842&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84956764842&partnerID=8YFLogxK

U2 - 10.5849/forsci.14-203

DO - 10.5849/forsci.14-203

M3 - Article

AN - SCOPUS:84956764842

SN - 0015-749X

VL - 62

SP - 9

EP - 17

JO - Forest Science

JF - Forest Science

IS - 1

ER -

Comparing statistical approaches for selecting optimal models of stem volume in loblolly pine plantations

Abstract

Keywords

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this