Electronic Health Record Phenotypes for Precision Medicine: Perspectives and Caveats From Treatment of Breast Cancer at a Single Institution

Matthew K. Breitenstein; Hongfang Liu; Kara N. Maxwell; Jyotishman Pathak; Rui Zhang

doi:10.1111/cts.12514

Electronic Health Record Phenotypes for Precision Medicine: Perspectives and Caveats From Treatment of Breast Cancer at a Single Institution

Matthew K. Breitenstein, Hongfang Liu, Kara N. Maxwell, Jyotishman Pathak, Rui Zhang

Pharmaceutical Care and Health Systems

Research output: Contribution to journal › Article › peer-review

14 Scopus citations

Abstract

Precision medicine is at the forefront of biomedical research. Cancer registries provide rich perspectives and electronic health records (EHRs) are commonly utilized to gather additional clinical data elements needed for translational research. However, manual annotation is resource-intense and not readily scalable. Informatics-based phenotyping presents an ideal solution, but perspectives obtained can be impacted by both data source and algorithm selection. We derived breast cancer (BC) receptor status phenotypes from structured and unstructured EHR data using rule-based algorithms, including natural language processing (NLP). Overall, the use of NLP increased BC receptor status coverage by 39.2% from 69.1% with structured medication information alone. Using all available EHR data, estrogen receptor-positive BC cases were ascertained with high precision (P = 0.976) and recall (R = 0.987) compared with gold standard chart-reviewed patients. However, status negation (R = 0.591) decreased 40.2% when relying on structured medications alone. Using multiple EHR data types (and thorough understanding of the perspectives offered) are necessary to derive robust EHR-based precision medicine phenotypes.

Original language	English (US)
Pages (from-to)	85-92
Number of pages	8
Journal	Clinical and translational science
Volume	11
Issue number	1
DOIs	https://doi.org/10.1111/cts.12514
State	Published - Jan 2018

Bibliographical note

Funding Information:
Acknowledgments. This work was supported by the National Cancer Institute-sponsored Mayo Clinic Cancer Genetic Epidemiology Training Program (R25 CA092049). The authors thank James R. Cerhan, MD, PhD, for the substantial editorial feedback provided in the development of this article. Further, the researchers thank the nurse abstraction group led by Wendy Gay for their contributions to chart review and cohort integrity assurance, and Xiaoyang Ruan, PhD, for assistance in deployment of natural language processing algorithms.

Publisher Copyright:
© 2017 The Authors. Clinical and Translational Science published by Wiley Periodicals, Inc. on behalf of American Society for Clinical Pharmacology and Therapeutics.

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access

10.1111/cts.12514

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5759745

OpenUrl availability

Full text

Cite this

@article{e70c0f7b05404c84a225e815e3b145db,

title = "Electronic Health Record Phenotypes for Precision Medicine: Perspectives and Caveats From Treatment of Breast Cancer at a Single Institution",

abstract = "Precision medicine is at the forefront of biomedical research. Cancer registries provide rich perspectives and electronic health records (EHRs) are commonly utilized to gather additional clinical data elements needed for translational research. However, manual annotation is resource-intense and not readily scalable. Informatics-based phenotyping presents an ideal solution, but perspectives obtained can be impacted by both data source and algorithm selection. We derived breast cancer (BC) receptor status phenotypes from structured and unstructured EHR data using rule-based algorithms, including natural language processing (NLP). Overall, the use of NLP increased BC receptor status coverage by 39.2% from 69.1% with structured medication information alone. Using all available EHR data, estrogen receptor-positive BC cases were ascertained with high precision (P = 0.976) and recall (R = 0.987) compared with gold standard chart-reviewed patients. However, status negation (R = 0.591) decreased 40.2% when relying on structured medications alone. Using multiple EHR data types (and thorough understanding of the perspectives offered) are necessary to derive robust EHR-based precision medicine phenotypes.",

author = "Breitenstein, {Matthew K.} and Hongfang Liu and Maxwell, {Kara N.} and Jyotishman Pathak and Rui Zhang",

note = "Funding Information: Acknowledgments. This work was supported by the National Cancer Institute-sponsored Mayo Clinic Cancer Genetic Epidemiology Training Program (R25 CA092049). The authors thank James R. Cerhan, MD, PhD, for the substantial editorial feedback provided in the development of this article. Further, the researchers thank the nurse abstraction group led by Wendy Gay for their contributions to chart review and cohort integrity assurance, and Xiaoyang Ruan, PhD, for assistance in deployment of natural language processing algorithms. Publisher Copyright: {\textcopyright} 2017 The Authors. Clinical and Translational Science published by Wiley Periodicals, Inc. on behalf of American Society for Clinical Pharmacology and Therapeutics.",

year = "2018",

month = jan,

doi = "10.1111/cts.12514",

language = "English (US)",

volume = "11",

pages = "85--92",

journal = "Clinical and translational science",

issn = "1752-8054",

publisher = "Wiley-Blackwell",

number = "1",

}

TY - JOUR

T1 - Electronic Health Record Phenotypes for Precision Medicine

T2 - Perspectives and Caveats From Treatment of Breast Cancer at a Single Institution

AU - Breitenstein, Matthew K.

AU - Liu, Hongfang

AU - Maxwell, Kara N.

AU - Pathak, Jyotishman

AU - Zhang, Rui

N1 - Funding Information: Acknowledgments. This work was supported by the National Cancer Institute-sponsored Mayo Clinic Cancer Genetic Epidemiology Training Program (R25 CA092049). The authors thank James R. Cerhan, MD, PhD, for the substantial editorial feedback provided in the development of this article. Further, the researchers thank the nurse abstraction group led by Wendy Gay for their contributions to chart review and cohort integrity assurance, and Xiaoyang Ruan, PhD, for assistance in deployment of natural language processing algorithms. Publisher Copyright: © 2017 The Authors. Clinical and Translational Science published by Wiley Periodicals, Inc. on behalf of American Society for Clinical Pharmacology and Therapeutics.

PY - 2018/1

Y1 - 2018/1

N2 - Precision medicine is at the forefront of biomedical research. Cancer registries provide rich perspectives and electronic health records (EHRs) are commonly utilized to gather additional clinical data elements needed for translational research. However, manual annotation is resource-intense and not readily scalable. Informatics-based phenotyping presents an ideal solution, but perspectives obtained can be impacted by both data source and algorithm selection. We derived breast cancer (BC) receptor status phenotypes from structured and unstructured EHR data using rule-based algorithms, including natural language processing (NLP). Overall, the use of NLP increased BC receptor status coverage by 39.2% from 69.1% with structured medication information alone. Using all available EHR data, estrogen receptor-positive BC cases were ascertained with high precision (P = 0.976) and recall (R = 0.987) compared with gold standard chart-reviewed patients. However, status negation (R = 0.591) decreased 40.2% when relying on structured medications alone. Using multiple EHR data types (and thorough understanding of the perspectives offered) are necessary to derive robust EHR-based precision medicine phenotypes.

AB - Precision medicine is at the forefront of biomedical research. Cancer registries provide rich perspectives and electronic health records (EHRs) are commonly utilized to gather additional clinical data elements needed for translational research. However, manual annotation is resource-intense and not readily scalable. Informatics-based phenotyping presents an ideal solution, but perspectives obtained can be impacted by both data source and algorithm selection. We derived breast cancer (BC) receptor status phenotypes from structured and unstructured EHR data using rule-based algorithms, including natural language processing (NLP). Overall, the use of NLP increased BC receptor status coverage by 39.2% from 69.1% with structured medication information alone. Using all available EHR data, estrogen receptor-positive BC cases were ascertained with high precision (P = 0.976) and recall (R = 0.987) compared with gold standard chart-reviewed patients. However, status negation (R = 0.591) decreased 40.2% when relying on structured medications alone. Using multiple EHR data types (and thorough understanding of the perspectives offered) are necessary to derive robust EHR-based precision medicine phenotypes.

UR - http://www.scopus.com/inward/record.url?scp=85040248565&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85040248565&partnerID=8YFLogxK

U2 - 10.1111/cts.12514

DO - 10.1111/cts.12514

M3 - Article

C2 - 29084368

AN - SCOPUS:85040248565

SN - 1752-8054

VL - 11

SP - 85

EP - 92

JO - Clinical and translational science

JF - Clinical and translational science

IS - 1

ER -

Electronic Health Record Phenotypes for Precision Medicine: Perspectives and Caveats From Treatment of Breast Cancer at a Single Institution

Abstract

Bibliographical note

UN SDGs

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this