The feasible solution algorithm for the minimum covariance determinant estimator in multivariate data

Douglas M. Hawkins

Research output: Contribution to journalArticlepeer-review

54 Scopus citations

Abstract

Outliers in multivariate data present a particularly intractable problem. There are various criteria for their identification, and for the related problem of high breakdown robust estimation of multivariate location and scale, but to date no reliable algorithm has been proposed for fitting by these criteria. The most widely used multivariate high breakdown method is the minimum volume ellipsoid (MVE) approach which seeks the ellipsoid of minimum volume containing at least half of the data. The minimum covariance determinant (MCD) approach uses instead the sample mean vector and covariance matrix of a fixed-size subset of the data, the cases in the subset being chosen to minimize the determinant. The MCD approach has better theoretical properties than the MVE, but appears not to be widely used at present, partly no doubt for the lack of an accepted algorithm for finding it. In this paper, we present an algorithm for obtaining the MCD estimator for a given data set. The algorithm is probabilistic; it involves taking random starting 'trial solutions' and refining each to a local optimum satisfying the necessary condition for the MCD optimum. The method's convergence is guaranteed only asymptotically as the number of random starting values goes to infinity. We demonstrate however using simulated data and examples of real data from the literature that its practical performance is excellent: that in these data sets it reaches the global optimum with high probability using a very modest number of random starting points. This gives reason to expect that it will handle other real data sets equally effectively, providing a easily computed high breakdown multivariate estimator for use, both in its own right, and as a starting point for other robust procedures.

Original languageEnglish (US)
Pages (from-to)197-210
Number of pages14
JournalComputational Statistics and Data Analysis
Volume17
Issue number2
DOIs
StatePublished - Feb 1994

Keywords

  • High breakdown estimation
  • Multivariate
  • Outliers

Fingerprint Dive into the research topics of 'The feasible solution algorithm for the minimum covariance determinant estimator in multivariate data'. Together they form a unique fingerprint.

Cite this