TY - JOUR
T1 - A Simulation Study to Investigate the Behavior of the Log-Density Ratio Under Normality
AU - Scrucca, Luca
AU - Weisberg, Sanford
PY - 2004/2
Y1 - 2004/2
N2 - For a logistic regression model the log-odds depend on the log of the ratio of the conditional densities of the predictors given the response variable. This suggests that relevant statistical information could be extracted by investigating the inverse problem of the predictors given the response. For binary responses, assuming certain parametric distributions, it is possible to obtain which terms are needed, and how they should be included in a logistic regression model. In the one predictor case, and under the normality assumption, a known result shows that a linear and a quadratic term are needed in a logistic regression model, with the quadratic term not required if the two conditional distributions have the same variance. However, the quadratic component may not be needed if the linear term is sufficient to discriminate between the two groups, that is if the two conditional distributions are far enough apart. A simulation study is presented which shows that if the ratio of variances is between 2/3 and 1.5 the quadratic term is less likely to be useful; this also happens when the mean difference scaled by the variance ratio tends to be large. Graphically, if the conditional distributions of x|y for the two groups are well separated a linear term should contain all the relevant statistical information available in the data. On the contrary, if they overlap significantly, and the variances are clearly not equal, then the quadratic term is likely to be needed. Minor deviations from normality should not be worrisome, particularly outside the range in which the empirical distributions overlap.
AB - For a logistic regression model the log-odds depend on the log of the ratio of the conditional densities of the predictors given the response variable. This suggests that relevant statistical information could be extracted by investigating the inverse problem of the predictors given the response. For binary responses, assuming certain parametric distributions, it is possible to obtain which terms are needed, and how they should be included in a logistic regression model. In the one predictor case, and under the normality assumption, a known result shows that a linear and a quadratic term are needed in a logistic regression model, with the quadratic term not required if the two conditional distributions have the same variance. However, the quadratic component may not be needed if the linear term is sufficient to discriminate between the two groups, that is if the two conditional distributions are far enough apart. A simulation study is presented which shows that if the ratio of variances is between 2/3 and 1.5 the quadratic term is less likely to be useful; this also happens when the mean difference scaled by the variance ratio tends to be large. Graphically, if the conditional distributions of x|y for the two groups are well separated a linear term should contain all the relevant statistical information available in the data. On the contrary, if they overlap significantly, and the variances are clearly not equal, then the quadratic term is likely to be needed. Minor deviations from normality should not be worrisome, particularly outside the range in which the empirical distributions overlap.
KW - Binary response
KW - Log-density ratio
KW - Logistic regression
KW - Monte Carlo simulation
KW - Regression graphics
UR - http://www.scopus.com/inward/record.url?scp=1642380961&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=1642380961&partnerID=8YFLogxK
U2 - 10.1081/SAC-120028439
DO - 10.1081/SAC-120028439
M3 - Article
AN - SCOPUS:1642380961
VL - 33
SP - 159
EP - 178
JO - Communications in Statistics Part B: Simulation and Computation
JF - Communications in Statistics Part B: Simulation and Computation
SN - 0361-0918
IS - 1
ER -