TY - JOUR
T1 - Online Censoring for Large-Scale Regressions with Application to Streaming Big Data
AU - Berberidis, Dimitris
AU - Kekatos, Vassilis
AU - Giannakis, Georgios B.
N1 - Publisher Copyright:
© 1991-2012 IEEE.
PY - 2016/8/1
Y1 - 2016/8/1
N2 - On par with data-intensive applications, the sheer size of modern linear regression problems creates an ever-growing demand for efficient solvers. Fortunately, a significant percentage of the data accrued can be omitted while maintaining a certain quality of statistical inference with an affordable computational budget. This work introduces means of identifying and omitting less informative observations in an online and data-Adaptive fashion. Given streaming data, the related maximum-likelihood estimator is sequentially found using first-and second-order stochastic approximation algorithms. These schemes are well suited when data are inherently censored or when the aim is to save communication overhead in decentralized learning setups. In a different operational scenario, the task of joint censoring and estimation is put forth to solve large-scale linear regressions in a centralized setup. Novel online algorithms are developed enjoying simple closed-form updates and provable (non)asymptotic convergence guarantees. To attain desired censoring patterns and levels of dimensionality reduction, thresholding rules are investigated too. Numerical tests on real and synthetic datasets corroborate the efficacy of the proposed data-Adaptive methods compared to data-Agnostic random projection-based alternatives.
AB - On par with data-intensive applications, the sheer size of modern linear regression problems creates an ever-growing demand for efficient solvers. Fortunately, a significant percentage of the data accrued can be omitted while maintaining a certain quality of statistical inference with an affordable computational budget. This work introduces means of identifying and omitting less informative observations in an online and data-Adaptive fashion. Given streaming data, the related maximum-likelihood estimator is sequentially found using first-and second-order stochastic approximation algorithms. These schemes are well suited when data are inherently censored or when the aim is to save communication overhead in decentralized learning setups. In a different operational scenario, the task of joint censoring and estimation is put forth to solve large-scale linear regressions in a centralized setup. Novel online algorithms are developed enjoying simple closed-form updates and provable (non)asymptotic convergence guarantees. To attain desired censoring patterns and levels of dimensionality reduction, thresholding rules are investigated too. Numerical tests on real and synthetic datasets corroborate the efficacy of the proposed data-Adaptive methods compared to data-Agnostic random projection-based alternatives.
KW - Parameter estimation
KW - big data
KW - least squares
UR - http://www.scopus.com/inward/record.url?scp=84974622652&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84974622652&partnerID=8YFLogxK
U2 - 10.1109/TSP.2016.2546225
DO - 10.1109/TSP.2016.2546225
M3 - Article
C2 - 28042229
AN - SCOPUS:84974622652
SN - 1053-587X
VL - 64
SP - 3854
EP - 3867
JO - IEEE Transactions on Signal Processing
JF - IEEE Transactions on Signal Processing
IS - 15
M1 - 7439865
ER -