Big data and partial least-squares prediction

R. Dennis Cook, Liliana Forzani

Research output: Contribution to journalArticlepeer-review

25 Scopus citations

Abstract

We give a commentary on the challenges of big data for Statistics. We then narrow our discussion to one of those challenges: dimension reduction. This leads to consideration of one particular dimension reduction method—partial least-squares (PLS) regression—for prediction in big high-dimensional regressions where the sample size and the number of predictors are both large. We show that in some regression contexts single-component PLS predictions converge at the usual root-n rate as n,p → ∞ regardless of the relationship between the sample size n and number of predictors p. Asymptotically, PLS predictions then behave as regression predictions in the usual context where p is fixed and n→ ∞ These results support the conjecture that PLS regression can be an effective method for prediction in big high-dimensional regressions.

Original languageEnglish (US)
Pages (from-to)62-78
Number of pages17
JournalCanadian Journal of Statistics
Volume46
Issue number1
DOIs
StatePublished - Mar 2018

Bibliographical note

Publisher Copyright:
© 2017 Statistical Society of Canada

Keywords

  • Abundant regressions
  • MSC 2010: Primary 62J05
  • data science
  • dimension reduction
  • secondary 62F12
  • sparse regressions

Fingerprint

Dive into the research topics of 'Big data and partial least-squares prediction'. Together they form a unique fingerprint.

Cite this