Another look at statistical learning theory and regularization

Vladimir Cherkassky; Yunqian Ma

doi:10.1016/j.neunet.2009.04.005

Another look at statistical learning theory and regularization

Vladimir Cherkassky, Yunqian Ma

Electrical and Computer Engineering

Research output: Contribution to journal › Article › peer-review

31 Scopus citations

Abstract

The paper reviews and highlights distinctions between function-approximation (FA) and VC theory and methodology, mainly within the setting of regression problems and a squared-error loss function, and illustrates empirically the differences between the two when data is sparse and/or input distribution is non-uniform. In FA theory, the goal is to estimate an unknown true dependency (or 'target' function) in regression problems, or posterior probability P (y / x) in classification problems. In VC theory, the goal is to 'imitate' unknown target function, in the sense of minimization of prediction risk or good 'generalization'. That is, the result of VC learning depends on (unknown) input distribution, while that of FA does not. This distinction is important because regularization theory originally introduced under clearly stated FA setting [Tikhonov, N. (1963). On solving ill-posed problem and method of regularization. Doklady Akademii Nauk USSR, 153, 501-504; Tikhonov, N., & V. Y. Arsenin (1977). Solution of ill-posed problems. Washington, DC: W. H. Winston], has been later used under risk-minimization or VC setting. More recently, several authors [Evgeniou, T., Pontil, M., & Poggio, T. (2000). Regularization networks and support vector machines. Advances in Computational Mathematics, 13, 1-50; Hastie, T., Tibshirani, R., & Friedman, J. (2001). The elements of statistical learning: Data mining, inference and prediction. Springer; Poggio, T. and Smale, S., (2003). The mathematics of learning: Dealing with data. Notices of the AMS, 50 (5), 537-544] applied constructive methodology based on regularization framework to learning dependencies from data (under VC-theoretical setting). However, such regularization-based learning is usually presented as a purely constructive methodology (with no clearly stated problem setting). This paper compares FA/regularization and VC/risk minimization methodologies in terms of underlying theoretical assumptions. The control of model complexity, using regularization and using the concept of margin in SVMs, is contrasted in the FA and VC formulations.

Original language	English (US)
Pages (from-to)	958-969
Number of pages	12
Journal	Neural Networks
Volume	22
Issue number	7
DOIs	https://doi.org/10.1016/j.neunet.2009.04.005
State	Published - Sep 2009

Bibliographical note

Funding Information:
This work was supported, in part, by NSF grant EECS-0802056, and by the A. Richard Newton Breakthrough Research Award from Microsoft Corporation. The authors also would like to thank the Associate Editor for very thorough and detailed comments that helped to improve the quality of the final paper.

Copyright:
Copyright 2009 Elsevier B.V., All rights reserved.

Keywords

Function approximation
Model identification
Penalization
Predictive learning
Regularization
Ridge regression
SVM regression
Statistical model estimation
Structural risk minimization
VC-theory

Access

10.1016/j.neunet.2009.04.005

OpenUrl availability

Full text

Cite this

@article{45bd75a8bea3423aba6a526095a60c0f,

title = "Another look at statistical learning theory and regularization",

abstract = "The paper reviews and highlights distinctions between function-approximation (FA) and VC theory and methodology, mainly within the setting of regression problems and a squared-error loss function, and illustrates empirically the differences between the two when data is sparse and/or input distribution is non-uniform. In FA theory, the goal is to estimate an unknown true dependency (or 'target' function) in regression problems, or posterior probability P (y / x) in classification problems. In VC theory, the goal is to 'imitate' unknown target function, in the sense of minimization of prediction risk or good 'generalization'. That is, the result of VC learning depends on (unknown) input distribution, while that of FA does not. This distinction is important because regularization theory originally introduced under clearly stated FA setting [Tikhonov, N. (1963). On solving ill-posed problem and method of regularization. Doklady Akademii Nauk USSR, 153, 501-504; Tikhonov, N., & V. Y. Arsenin (1977). Solution of ill-posed problems. Washington, DC: W. H. Winston], has been later used under risk-minimization or VC setting. More recently, several authors [Evgeniou, T., Pontil, M., & Poggio, T. (2000). Regularization networks and support vector machines. Advances in Computational Mathematics, 13, 1-50; Hastie, T., Tibshirani, R., & Friedman, J. (2001). The elements of statistical learning: Data mining, inference and prediction. Springer; Poggio, T. and Smale, S., (2003). The mathematics of learning: Dealing with data. Notices of the AMS, 50 (5), 537-544] applied constructive methodology based on regularization framework to learning dependencies from data (under VC-theoretical setting). However, such regularization-based learning is usually presented as a purely constructive methodology (with no clearly stated problem setting). This paper compares FA/regularization and VC/risk minimization methodologies in terms of underlying theoretical assumptions. The control of model complexity, using regularization and using the concept of margin in SVMs, is contrasted in the FA and VC formulations.",

keywords = "Function approximation, Model identification, Penalization, Predictive learning, Regularization, Ridge regression, SVM regression, Statistical model estimation, Structural risk minimization, VC-theory",

author = "Vladimir Cherkassky and Yunqian Ma",

note = "Funding Information: This work was supported, in part, by NSF grant EECS-0802056, and by the A. Richard Newton Breakthrough Research Award from Microsoft Corporation. The authors also would like to thank the Associate Editor for very thorough and detailed comments that helped to improve the quality of the final paper. Copyright: Copyright 2009 Elsevier B.V., All rights reserved.",

year = "2009",

month = sep,

doi = "10.1016/j.neunet.2009.04.005",

language = "English (US)",

volume = "22",

pages = "958--969",

journal = "Neural Networks",

issn = "0893-6080",

publisher = "Elsevier Limited",

number = "7",

}

TY - JOUR

T1 - Another look at statistical learning theory and regularization

AU - Cherkassky, Vladimir

AU - Ma, Yunqian

N1 - Funding Information: This work was supported, in part, by NSF grant EECS-0802056, and by the A. Richard Newton Breakthrough Research Award from Microsoft Corporation. The authors also would like to thank the Associate Editor for very thorough and detailed comments that helped to improve the quality of the final paper. Copyright: Copyright 2009 Elsevier B.V., All rights reserved.

PY - 2009/9

Y1 - 2009/9

N2 - The paper reviews and highlights distinctions between function-approximation (FA) and VC theory and methodology, mainly within the setting of regression problems and a squared-error loss function, and illustrates empirically the differences between the two when data is sparse and/or input distribution is non-uniform. In FA theory, the goal is to estimate an unknown true dependency (or 'target' function) in regression problems, or posterior probability P (y / x) in classification problems. In VC theory, the goal is to 'imitate' unknown target function, in the sense of minimization of prediction risk or good 'generalization'. That is, the result of VC learning depends on (unknown) input distribution, while that of FA does not. This distinction is important because regularization theory originally introduced under clearly stated FA setting [Tikhonov, N. (1963). On solving ill-posed problem and method of regularization. Doklady Akademii Nauk USSR, 153, 501-504; Tikhonov, N., & V. Y. Arsenin (1977). Solution of ill-posed problems. Washington, DC: W. H. Winston], has been later used under risk-minimization or VC setting. More recently, several authors [Evgeniou, T., Pontil, M., & Poggio, T. (2000). Regularization networks and support vector machines. Advances in Computational Mathematics, 13, 1-50; Hastie, T., Tibshirani, R., & Friedman, J. (2001). The elements of statistical learning: Data mining, inference and prediction. Springer; Poggio, T. and Smale, S., (2003). The mathematics of learning: Dealing with data. Notices of the AMS, 50 (5), 537-544] applied constructive methodology based on regularization framework to learning dependencies from data (under VC-theoretical setting). However, such regularization-based learning is usually presented as a purely constructive methodology (with no clearly stated problem setting). This paper compares FA/regularization and VC/risk minimization methodologies in terms of underlying theoretical assumptions. The control of model complexity, using regularization and using the concept of margin in SVMs, is contrasted in the FA and VC formulations.

AB - The paper reviews and highlights distinctions between function-approximation (FA) and VC theory and methodology, mainly within the setting of regression problems and a squared-error loss function, and illustrates empirically the differences between the two when data is sparse and/or input distribution is non-uniform. In FA theory, the goal is to estimate an unknown true dependency (or 'target' function) in regression problems, or posterior probability P (y / x) in classification problems. In VC theory, the goal is to 'imitate' unknown target function, in the sense of minimization of prediction risk or good 'generalization'. That is, the result of VC learning depends on (unknown) input distribution, while that of FA does not. This distinction is important because regularization theory originally introduced under clearly stated FA setting [Tikhonov, N. (1963). On solving ill-posed problem and method of regularization. Doklady Akademii Nauk USSR, 153, 501-504; Tikhonov, N., & V. Y. Arsenin (1977). Solution of ill-posed problems. Washington, DC: W. H. Winston], has been later used under risk-minimization or VC setting. More recently, several authors [Evgeniou, T., Pontil, M., & Poggio, T. (2000). Regularization networks and support vector machines. Advances in Computational Mathematics, 13, 1-50; Hastie, T., Tibshirani, R., & Friedman, J. (2001). The elements of statistical learning: Data mining, inference and prediction. Springer; Poggio, T. and Smale, S., (2003). The mathematics of learning: Dealing with data. Notices of the AMS, 50 (5), 537-544] applied constructive methodology based on regularization framework to learning dependencies from data (under VC-theoretical setting). However, such regularization-based learning is usually presented as a purely constructive methodology (with no clearly stated problem setting). This paper compares FA/regularization and VC/risk minimization methodologies in terms of underlying theoretical assumptions. The control of model complexity, using regularization and using the concept of margin in SVMs, is contrasted in the FA and VC formulations.

KW - Function approximation

KW - Model identification

KW - Penalization

KW - Predictive learning

KW - Regularization

KW - Ridge regression

KW - SVM regression

KW - Statistical model estimation

KW - Structural risk minimization

KW - VC-theory

UR - http://www.scopus.com/inward/record.url?scp=69449099786&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=69449099786&partnerID=8YFLogxK

U2 - 10.1016/j.neunet.2009.04.005

DO - 10.1016/j.neunet.2009.04.005

M3 - Article

C2 - 19443179

AN - SCOPUS:69449099786

SN - 0893-6080

VL - 22

SP - 958

EP - 969

JO - Neural Networks

JF - Neural Networks

IS - 7

ER -

Another look at statistical learning theory and regularization

Abstract

Bibliographical note

Keywords

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this