Abstract
In many scientific and engineering problems, selecting the optimal model from a large pool of candidate models is important, particularly in data mining. In the literature, model assessment in the context of non-normal distributions has not yet received a lot of attention. Indeed, many existing model selection criteria such as the Bayes information criterion and C p may not be suitable for a situation in which the conditional mean and variance of the response are dependent, such as in generalized linear model regression. In this article we propose a new adaptive model selection criterion and construct an approximately unbiased Kullback-Leibler loss estimator for model assessment in the context of exponential family distributions. This permits comparing any arbitrary complex modeling procedures. Our proposal uses a concept called generalized degrees of freedom that generalizes the concept originally proposed for the normal distribution. The proposed procedure is implemented for the binomial and Poisson distributions and its small sample operating characteristics are examined via simulations. The usefulness of the method is demonstrated by an application to a study of the effect of air pollution on certain respiratory diseases. Numerical analyses support the utility of the methodology.
Original language | English (US) |
---|---|
Pages (from-to) | 306-317 |
Number of pages | 12 |
Journal | Technometrics |
Volume | 46 |
Issue number | 3 |
DOIs | |
State | Published - Aug 2004 |
Keywords
- Adaptive penalty
- Cross-validation
- Loss estimation
- Parametric and nonparametric regression
- Trees
- Variable selection