Latent supervised learning

Susan Wei, Michael R. Kosorok

Research output: Contribution to journalArticlepeer-review

20 Scopus citations

Abstract

This article introduces a new machine learning task, called latent supervised learning, where the goal is to learn a binary classifier from continuous training labels that serve as surrogates for the unobserved class labels. We investigate a specific model where the surrogate variable arises from a two-component Gaussian mixture with unknown means and variances, and the component membership is determined by a hyperplane in the covariate space. The estimation of the separating hyperplane and the Gaussian mixture parameters forms what shall be referred to as the change-line classification problem. We propose a data-driven sieve maximum likelihood estimator for the hyperplane, which in turn can be used to estimate the parameters of the Gaussian mixture. The estimator is shown to be consistent. Simulations as well as empirical data show the estimator has high classification accuracy.

Original languageEnglish (US)
Pages (from-to)957-970
Number of pages14
JournalJournal of the American Statistical Association
Volume108
Issue number503
DOIs
StatePublished - 2013
Externally publishedYes

Bibliographical note

Funding Information:
Susan Wei is Doctoral Student, Department of Statistics and Operations Research, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 (Email: susanwe@live.unc.edu). Michael R. Kosorok is Professor and Chair, Department of Biostatistics and Professor, Department of Statistics and Operations Research, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 (Email: kosorok@unc.edu). The first author was funded through the National Science Foundation Graduate Fellowship and the National Institutes of Health (NIH) grant T32 GM067553-05S1. The second author was funded in part by the NIH grant CA142538. We thank Editor Xuming He, the Associate Editor, and two anonymous referees for their helpful comments that led to a significantly improved article.

Keywords

  • Classification and clustering
  • Glivenko-Cantelli classes
  • Sieve maximum likelihood estimation
  • Sliced inverse regression
  • Statistical learning

Fingerprint

Dive into the research topics of 'Latent supervised learning'. Together they form a unique fingerprint.

Cite this