Smoothing spline ANOVA for super-large samples: Scalable computation via rounding parameters

Nathaniel E. Helwig; Ping Ma

doi:10.4310/SII.2016.v9.n4.a3

Smoothing spline ANOVA for super-large samples: Scalable computation via rounding parameters

Nathaniel E. Helwig, Ping Ma

Psychology (Twin Cities)

Research output: Contribution to journal › Article › peer-review

9 Scopus citations

Abstract

In the current era of big data, researchers routinely collect and analyze data of super-large sample sizes. Data-oriented statistical methods have been developed to extract information from super-large data. Smoothing spline ANOVA (SSANOVA) is a promising approach for extracting information from noisy data; however, the heavy computational cost of SSANOVA hinders its wide application. In this paper, we propose a new algorithm for fitting SSANOVA models to super-large sample data. In this algorithm, we introduce rounding parameters to make the computation scalable. To demonstrate the benefits of the rounding parameters, we present a simulation study and a real data example using electroencephalography data. Our results reveal that (using the rounding parameters) a researcher can fit nonparametric regression models to very large samples within a few seconds using a standard laptop or tablet computer.

Original language	English (US)
Pages (from-to)	433-444
Number of pages	12
Journal	Statistics and its Interface
Volume	9
Issue number	4
DOIs	https://doi.org/10.4310/SII.2016.v9.n4.a3
State	Published - 2016

Bibliographical note

Funding Information:
This research was partially supported by NSF grants DMS 1440037 and DMS 1438957, and start-up funds from the University of Minnesota.

Keywords

Rounding parameter
Scalable algorithm
Smoothing spline ANOVA

Access

10.4310/SII.2016.v9.n4.a3

OpenUrl availability

Full text

Cite this

@article{fa22c50436ae48b4998ccf3a3a1dae5b,

title = "Smoothing spline ANOVA for super-large samples: Scalable computation via rounding parameters",

abstract = "In the current era of big data, researchers routinely collect and analyze data of super-large sample sizes. Data-oriented statistical methods have been developed to extract information from super-large data. Smoothing spline ANOVA (SSANOVA) is a promising approach for extracting information from noisy data; however, the heavy computational cost of SSANOVA hinders its wide application. In this paper, we propose a new algorithm for fitting SSANOVA models to super-large sample data. In this algorithm, we introduce rounding parameters to make the computation scalable. To demonstrate the benefits of the rounding parameters, we present a simulation study and a real data example using electroencephalography data. Our results reveal that (using the rounding parameters) a researcher can fit nonparametric regression models to very large samples within a few seconds using a standard laptop or tablet computer.",

keywords = "Rounding parameter, Scalable algorithm, Smoothing spline ANOVA",

author = "Helwig, {Nathaniel E.} and Ping Ma",

note = "Funding Information: This research was partially supported by NSF grants DMS 1440037 and DMS 1438957, and start-up funds from the University of Minnesota.",

year = "2016",

doi = "10.4310/SII.2016.v9.n4.a3",

language = "English (US)",

volume = "9",

pages = "433--444",

journal = "Statistics and its Interface",

issn = "1938-7989",

publisher = "International Press of Boston, Inc.",

number = "4",

}

TY - JOUR

T1 - Smoothing spline ANOVA for super-large samples

T2 - Scalable computation via rounding parameters

AU - Helwig, Nathaniel E.

AU - Ma, Ping

N1 - Funding Information: This research was partially supported by NSF grants DMS 1440037 and DMS 1438957, and start-up funds from the University of Minnesota.

PY - 2016

Y1 - 2016

N2 - In the current era of big data, researchers routinely collect and analyze data of super-large sample sizes. Data-oriented statistical methods have been developed to extract information from super-large data. Smoothing spline ANOVA (SSANOVA) is a promising approach for extracting information from noisy data; however, the heavy computational cost of SSANOVA hinders its wide application. In this paper, we propose a new algorithm for fitting SSANOVA models to super-large sample data. In this algorithm, we introduce rounding parameters to make the computation scalable. To demonstrate the benefits of the rounding parameters, we present a simulation study and a real data example using electroencephalography data. Our results reveal that (using the rounding parameters) a researcher can fit nonparametric regression models to very large samples within a few seconds using a standard laptop or tablet computer.

AB - In the current era of big data, researchers routinely collect and analyze data of super-large sample sizes. Data-oriented statistical methods have been developed to extract information from super-large data. Smoothing spline ANOVA (SSANOVA) is a promising approach for extracting information from noisy data; however, the heavy computational cost of SSANOVA hinders its wide application. In this paper, we propose a new algorithm for fitting SSANOVA models to super-large sample data. In this algorithm, we introduce rounding parameters to make the computation scalable. To demonstrate the benefits of the rounding parameters, we present a simulation study and a real data example using electroencephalography data. Our results reveal that (using the rounding parameters) a researcher can fit nonparametric regression models to very large samples within a few seconds using a standard laptop or tablet computer.

KW - Rounding parameter

KW - Scalable algorithm

KW - Smoothing spline ANOVA

UR - http://www.scopus.com/inward/record.url?scp=84994086974&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84994086974&partnerID=8YFLogxK

U2 - 10.4310/SII.2016.v9.n4.a3

DO - 10.4310/SII.2016.v9.n4.a3

M3 - Article

AN - SCOPUS:84994086974

SN - 1938-7989

VL - 9

SP - 433

EP - 444

JO - Statistics and its Interface

JF - Statistics and its Interface

IS - 4

ER -

Smoothing spline ANOVA for super-large samples: Scalable computation via rounding parameters

Abstract

Bibliographical note

Keywords

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this