Big data and partial least-squares prediction

R. Dennis Cook; Liliana Forzani

doi:10.1002/cjs.11316

Big data and partial least-squares prediction

R. Dennis Cook, Liliana Forzani

Statistics (Twin Cities)

Research output: Contribution to journal › Article › peer-review

25 Scopus citations

Abstract

We give a commentary on the challenges of big data for Statistics. We then narrow our discussion to one of those challenges: dimension reduction. This leads to consideration of one particular dimension reduction method—partial least-squares (PLS) regression—for prediction in big high-dimensional regressions where the sample size and the number of predictors are both large. We show that in some regression contexts single-component PLS predictions converge at the usual root-n rate as n,p → ∞ regardless of the relationship between the sample size n and number of predictors p. Asymptotically, PLS predictions then behave as regression predictions in the usual context where p is fixed and n→ ∞ These results support the conjecture that PLS regression can be an effective method for prediction in big high-dimensional regressions.

Original language	English (US)
Pages (from-to)	62-78
Number of pages	17
Journal	Canadian Journal of Statistics
Volume	46
Issue number	1
DOIs	https://doi.org/10.1002/cjs.11316
State	Published - Mar 2018

Bibliographical note

Publisher Copyright:
© 2017 Statistical Society of Canada

Keywords

Abundant regressions
MSC 2010: Primary 62J05
data science
dimension reduction
secondary 62F12
sparse regressions

Access

10.1002/cjs.11316

OpenUrl availability

Full text

Cite this

@article{e17ec4adbcf841678815b2726eed8d64,

title = "Big data and partial least-squares prediction",

abstract = "We give a commentary on the challenges of big data for Statistics. We then narrow our discussion to one of those challenges: dimension reduction. This leads to consideration of one particular dimension reduction method—partial least-squares (PLS) regression—for prediction in big high-dimensional regressions where the sample size and the number of predictors are both large. We show that in some regression contexts single-component PLS predictions converge at the usual root-n rate as n,p → ∞ regardless of the relationship between the sample size n and number of predictors p. Asymptotically, PLS predictions then behave as regression predictions in the usual context where p is fixed and n→ ∞ These results support the conjecture that PLS regression can be an effective method for prediction in big high-dimensional regressions.",

keywords = "Abundant regressions, MSC 2010: Primary 62J05, data science, dimension reduction, secondary 62F12, sparse regressions",

author = "Cook, {R. Dennis} and Liliana Forzani",

note = "Publisher Copyright: {\textcopyright} 2017 Statistical Society of Canada",

year = "2018",

month = mar,

doi = "10.1002/cjs.11316",

language = "English (US)",

volume = "46",

pages = "62--78",

journal = "Canadian Journal of Statistics",

issn = "0319-5724",

publisher = "Statistical Society of Canada",

number = "1",

}

TY - JOUR

T1 - Big data and partial least-squares prediction

AU - Cook, R. Dennis

AU - Forzani, Liliana

PY - 2018/3

Y1 - 2018/3

N2 - We give a commentary on the challenges of big data for Statistics. We then narrow our discussion to one of those challenges: dimension reduction. This leads to consideration of one particular dimension reduction method—partial least-squares (PLS) regression—for prediction in big high-dimensional regressions where the sample size and the number of predictors are both large. We show that in some regression contexts single-component PLS predictions converge at the usual root-n rate as n,p → ∞ regardless of the relationship between the sample size n and number of predictors p. Asymptotically, PLS predictions then behave as regression predictions in the usual context where p is fixed and n→ ∞ These results support the conjecture that PLS regression can be an effective method for prediction in big high-dimensional regressions.

AB - We give a commentary on the challenges of big data for Statistics. We then narrow our discussion to one of those challenges: dimension reduction. This leads to consideration of one particular dimension reduction method—partial least-squares (PLS) regression—for prediction in big high-dimensional regressions where the sample size and the number of predictors are both large. We show that in some regression contexts single-component PLS predictions converge at the usual root-n rate as n,p → ∞ regardless of the relationship between the sample size n and number of predictors p. Asymptotically, PLS predictions then behave as regression predictions in the usual context where p is fixed and n→ ∞ These results support the conjecture that PLS regression can be an effective method for prediction in big high-dimensional regressions.

KW - Abundant regressions

KW - MSC 2010: Primary 62J05

KW - data science

KW - dimension reduction

KW - secondary 62F12

KW - sparse regressions

UR - http://www.scopus.com/inward/record.url?scp=85017416232&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85017416232&partnerID=8YFLogxK

U2 - 10.1002/cjs.11316

DO - 10.1002/cjs.11316

M3 - Article

AN - SCOPUS:85017416232

SN - 0319-5724

VL - 46

SP - 62

EP - 78

JO - Canadian Journal of Statistics

JF - Canadian Journal of Statistics

IS - 1

ER -

Big data and partial least-squares prediction

Abstract

Bibliographical note

Keywords

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this