Machine learning for transient discovery in Pan-STARRS1 difference imaging

D. E. Wright; S. J. Smartt; K. W. Smith; P. Miller; R. Kotak; A. Rest; W. S. Burgett; K. C. Chambers; H. Flewelling; K. W. Hodapp; M. Huber; R. Jedicke; N. Kaiser; N. Metcalfe; P. A. Price; J. L. Tonry; R. J. Wainscoat; C. Waters

doi:10.1093/mnras/stv292

Machine learning for transient discovery in Pan-STARRS1 difference imaging

D. E. Wright, S. J. Smartt, K. W. Smith, P. Miller, R. Kotak, A. Rest, W. S. Burgett, K. C. Chambers, H. Flewelling, K. W. Hodapp, M. Huber, R. Jedicke, N. Kaiser, N. Metcalfe, P. A. Price, J. L. Tonry, R. J. Wainscoat, C. Waters

Physics and Astronomy (Twin Cities)

Research output: Contribution to journal › Article › peer-review

54 Scopus citations

Abstract

Efficient identification and follow-up of astronomical transients is hindered by the need for humans to manually select promising candidates from data streams that contain many false positives. These artefacts arise in the difference images that are produced by most major ground-based time-domain surveys with large format CCD cameras. This dependence on humans to reject bogus detections is unsustainable for next generation all-sky surveys and significant effort is now being invested to solve the problem computationally. In this paper, we explore a simple machine learning approach to real-bogus classification by constructing a training set from the image data of ~32 000 real astrophysical transients and bogus detections from the Pan-STARRS1 Medium Deep Survey. We derive our feature representation from the pixel intensity values of a 20 × 20 pixel stamp around the centre of the candidates. This differs from previous work in that it works directly on the pixels rather than catalogued domain knowledge for feature design or selection. Three machine learning algorithms are trained (artificial neural networks, support vector machines and random forests) and their performances are tested on a held-out subset of 25 per cent of the training data. We find the best results from the random forest classifier and demonstrate that by accepting a false positive rate of 1 per cent, the classifier initially suggests a missed detection rate of around 10 per cent. However, we also find that a combination of bright star variability, nuclear transients and uncertainty in human labelling means that our best estimate of the missed detection rate is approximately 6 per cent.

Original language	English (US)
Pages (from-to)	451-466
Number of pages	16
Journal	Monthly Notices of the Royal Astronomical Society
Volume	449
Issue number	1
DOIs	https://doi.org/10.1093/mnras/stv292
State	Published - Feb 23 2015

Bibliographical note

Publisher Copyright:
© 2015 The Authors.

Keywords

Methods: data analysis
Methods: statistical
Supernovae: general
Surveys
Techniques: image processing

Access

10.1093/mnras/stv292

OpenUrl availability

Full text

Cite this

Wright, D. E., Smartt, S. J., Smith, K. W., Miller, P., Kotak, R., Rest, A., Burgett, W. S., Chambers, K. C., Flewelling, H., Hodapp, K. W., Huber, M., Jedicke, R., Kaiser, N., Metcalfe, N., Price, P. A., Tonry, J. L., Wainscoat, R. J., & Waters, C. (2015). Machine learning for transient discovery in Pan-STARRS1 difference imaging. Monthly Notices of the Royal Astronomical Society, 449(1), 451-466. https://doi.org/10.1093/mnras/stv292

Wright, DE, Smartt, SJ, Smith, KW, Miller, P, Kotak, R, Rest, A, Burgett, WS, Chambers, KC, Flewelling, H, Hodapp, KW, Huber, M, Jedicke, R, Kaiser, N, Metcalfe, N, Price, PA, Tonry, JL, Wainscoat, RJ & Waters, C 2015, 'Machine learning for transient discovery in Pan-STARRS1 difference imaging', Monthly Notices of the Royal Astronomical Society, vol. 449, no. 1, pp. 451-466. https://doi.org/10.1093/mnras/stv292

@article{3be64caffaaf48e2b89bf1459d667c9f,

title = "Machine learning for transient discovery in Pan-STARRS1 difference imaging",

abstract = "Efficient identification and follow-up of astronomical transients is hindered by the need for humans to manually select promising candidates from data streams that contain many false positives. These artefacts arise in the difference images that are produced by most major ground-based time-domain surveys with large format CCD cameras. This dependence on humans to reject bogus detections is unsustainable for next generation all-sky surveys and significant effort is now being invested to solve the problem computationally. In this paper, we explore a simple machine learning approach to real-bogus classification by constructing a training set from the image data of ~32 000 real astrophysical transients and bogus detections from the Pan-STARRS1 Medium Deep Survey. We derive our feature representation from the pixel intensity values of a 20 × 20 pixel stamp around the centre of the candidates. This differs from previous work in that it works directly on the pixels rather than catalogued domain knowledge for feature design or selection. Three machine learning algorithms are trained (artificial neural networks, support vector machines and random forests) and their performances are tested on a held-out subset of 25 per cent of the training data. We find the best results from the random forest classifier and demonstrate that by accepting a false positive rate of 1 per cent, the classifier initially suggests a missed detection rate of around 10 per cent. However, we also find that a combination of bright star variability, nuclear transients and uncertainty in human labelling means that our best estimate of the missed detection rate is approximately 6 per cent.",

keywords = "Methods: data analysis, Methods: statistical, Supernovae: general, Surveys, Techniques: image processing",

author = "Wright, {D. E.} and Smartt, {S. J.} and Smith, {K. W.} and P. Miller and R. Kotak and A. Rest and Burgett, {W. S.} and Chambers, {K. C.} and H. Flewelling and Hodapp, {K. W.} and M. Huber and R. Jedicke and N. Kaiser and N. Metcalfe and Price, {P. A.} and Tonry, {J. L.} and Wainscoat, {R. J.} and C. Waters",

note = "Publisher Copyright: {\textcopyright} 2015 The Authors.",

year = "2015",

month = feb,

day = "23",

doi = "10.1093/mnras/stv292",

language = "English (US)",

volume = "449",

pages = "451--466",

journal = "Monthly Notices of the Royal Astronomical Society",

issn = "0035-8711",

publisher = "Oxford University Press",

number = "1",

}

TY - JOUR

T1 - Machine learning for transient discovery in Pan-STARRS1 difference imaging

AU - Wright, D. E.

AU - Smartt, S. J.

AU - Smith, K. W.

AU - Miller, P.

AU - Kotak, R.

AU - Rest, A.

AU - Burgett, W. S.

AU - Chambers, K. C.

AU - Flewelling, H.

AU - Hodapp, K. W.

AU - Huber, M.

AU - Jedicke, R.

AU - Kaiser, N.

AU - Metcalfe, N.

AU - Price, P. A.

AU - Tonry, J. L.

AU - Wainscoat, R. J.

AU - Waters, C.

PY - 2015/2/23

Y1 - 2015/2/23

N2 - Efficient identification and follow-up of astronomical transients is hindered by the need for humans to manually select promising candidates from data streams that contain many false positives. These artefacts arise in the difference images that are produced by most major ground-based time-domain surveys with large format CCD cameras. This dependence on humans to reject bogus detections is unsustainable for next generation all-sky surveys and significant effort is now being invested to solve the problem computationally. In this paper, we explore a simple machine learning approach to real-bogus classification by constructing a training set from the image data of ~32 000 real astrophysical transients and bogus detections from the Pan-STARRS1 Medium Deep Survey. We derive our feature representation from the pixel intensity values of a 20 × 20 pixel stamp around the centre of the candidates. This differs from previous work in that it works directly on the pixels rather than catalogued domain knowledge for feature design or selection. Three machine learning algorithms are trained (artificial neural networks, support vector machines and random forests) and their performances are tested on a held-out subset of 25 per cent of the training data. We find the best results from the random forest classifier and demonstrate that by accepting a false positive rate of 1 per cent, the classifier initially suggests a missed detection rate of around 10 per cent. However, we also find that a combination of bright star variability, nuclear transients and uncertainty in human labelling means that our best estimate of the missed detection rate is approximately 6 per cent.

AB - Efficient identification and follow-up of astronomical transients is hindered by the need for humans to manually select promising candidates from data streams that contain many false positives. These artefacts arise in the difference images that are produced by most major ground-based time-domain surveys with large format CCD cameras. This dependence on humans to reject bogus detections is unsustainable for next generation all-sky surveys and significant effort is now being invested to solve the problem computationally. In this paper, we explore a simple machine learning approach to real-bogus classification by constructing a training set from the image data of ~32 000 real astrophysical transients and bogus detections from the Pan-STARRS1 Medium Deep Survey. We derive our feature representation from the pixel intensity values of a 20 × 20 pixel stamp around the centre of the candidates. This differs from previous work in that it works directly on the pixels rather than catalogued domain knowledge for feature design or selection. Three machine learning algorithms are trained (artificial neural networks, support vector machines and random forests) and their performances are tested on a held-out subset of 25 per cent of the training data. We find the best results from the random forest classifier and demonstrate that by accepting a false positive rate of 1 per cent, the classifier initially suggests a missed detection rate of around 10 per cent. However, we also find that a combination of bright star variability, nuclear transients and uncertainty in human labelling means that our best estimate of the missed detection rate is approximately 6 per cent.

KW - Methods: data analysis

KW - Methods: statistical

KW - Supernovae: general

KW - Surveys

KW - Techniques: image processing

UR - http://www.scopus.com/inward/record.url?scp=84930847955&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84930847955&partnerID=8YFLogxK

U2 - 10.1093/mnras/stv292

DO - 10.1093/mnras/stv292

M3 - Article

AN - SCOPUS:84930847955

SN - 0035-8711

VL - 449

SP - 451

EP - 466

JO - Monthly Notices of the Royal Astronomical Society

JF - Monthly Notices of the Royal Astronomical Society

IS - 1

ER -

Machine learning for transient discovery in Pan-STARRS1 difference imaging

Abstract

Bibliographical note

Keywords

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this