Proper statistical modeling and validation in QSAR: A case study in the prediction of rat fat-air partitioning

Subhash C Basak, Denise Mills, Douglas M Hawkins, Jessica J. Kraker

Research output: Chapter in Book/Report/Conference proceedingConference contribution

11 Scopus citations

Abstract

A number of multivariate regression methods commonly used to develop predictive models, along with model validation techniques, are contrary to the current opinion of experts in the field of statistics. Such methods result in overly optimistic models that cannot be relied upon to produce meaningful predictions for new compounds. Ridge regression is one appropriate methodology when the number of independent variables exceeds the number of observations. Although variable reduction is not a necessary component of a ridge regression analysis, descriptor thinning may be applied to eliminate variables that have no relationship to the property or activity of interest in an effort to increase model interpretability; although it is critical that this process be carried out correctly. In this paper, we have developed a predictive model for rat fat:air partition coefficient using proper statistical techniques. For comparative purposes, we have also used stepwise ordinary least squares regression, commonly used in QSAR studies but which often results in an inflated "naïve" q2. It is important to note that all descriptors used in this analysis are computed strictly from chemical structure without the need for any additional experimental input and, therefore, can be applied to any chemical, real or hypothetical, in order to assess the pharmacokinetics and toxic potential.

Original languageEnglish (US)
Title of host publicationComputation in Modern Science and Engineering - Proceedings of the International Conference on Computational Methods in Science and Engineering 2007 (ICCMSE 2007)
Pages548-551
Number of pages4
Edition2
DOIs
StatePublished - 2007
EventInternational Conference on Computational Methods in Science and Engineering 2007, ICCMSE 2007 - Corfu, Greece
Duration: Sep 25 2007Sep 30 2007

Publication series

NameAIP Conference Proceedings
Number2
Volume963
ISSN (Print)0094-243X
ISSN (Electronic)1551-7616

Other

OtherInternational Conference on Computational Methods in Science and Engineering 2007, ICCMSE 2007
Country/TerritoryGreece
CityCorfu
Period9/25/079/30/07

Keywords

  • Descriptor thinning
  • Gram-Schmidt
  • Mathematical descriptors
  • Overfitting
  • Ridge regression
  • Stepwise regression

Fingerprint

Dive into the research topics of 'Proper statistical modeling and validation in QSAR: A case study in the prediction of rat fat-air partitioning'. Together they form a unique fingerprint.

Cite this