Estimation of generalization error: Random and fixed inputs

Junhui Wang, Xiaotong T Shen

Research output: Contribution to journalArticlepeer-review

12 Scopus citations

Abstract

In multicategory classification, an estimated generalization error is often used to quantify a classifier's generalization ability. As a result, quality of estimation of the generalization error becomes crucial in tuning and combining classifiers. This article proposes an estimation methodology for the generalization error, permitting a treatment of both fixed and random inputs, which is in contrast to the conditional classification error commonly used in the statistics literature. In particular, we derive a novel data perturbation technique, that jointly perturbs both inputs and outputs, to estimate the generalization error. We show that the proposed technique yields optimal tuning and combination, as measured by generalization. We also demonstrate via simulation that it outperforms cross-validation for both fixed and random designs, in the context of margin classification. The results support utility of the proposed methodology.

Original languageEnglish (US)
Pages (from-to)569-588
Number of pages20
JournalStatistica Sinica
Volume16
Issue number2
StatePublished - Apr 1 2006

Keywords

  • Averaging
  • Logistic
  • Margins
  • Penalization
  • Support vector

Fingerprint Dive into the research topics of 'Estimation of generalization error: Random and fixed inputs'. Together they form a unique fingerprint.

Cite this