Prospective recruitment of patients with congestive heart failure using an ad-hoc binary classifier

Serguei V. Pakhomov; James Buntrock; Christopher G. Chute

doi:10.1016/j.jbi.2004.11.016

Prospective recruitment of patients with congestive heart failure using an ad-hoc binary classifier

Serguei V. Pakhomov, James Buntrock, Christopher G. Chute

Research output: Contribution to journal › Article › peer-review

52 Scopus citations

Abstract

This paper addresses a very specific problem of identifying patients diagnosed with a specific condition for potential recruitment in a clinical trial or an epidemiological study. We present a simple machine learning method for identifying patients diagnosed with congestive heart failure and other related conditions by automatically classifying clinical notes dictated at Mayo Clinic. This method relies on an automatic classifier trained on comparable amounts of positive and negative samples of clinical notes previously categorized by human experts. The documents are represented as feature vectors, where features are a mix of demographic information as well as single words and concept mappings to MeSH and HICDA classification systems. We compare two simple and efficient classification algorithms (Naïve Bayes and Perceptron) and a baseline term spotting method with respect to their accuracy and recall on positive samples. Depending on the test set, we find that Naïve Bayes yields better recall on positive samples (95 vs. 86%) but worse accuracy than Perceptron (57 vs. 65%). Both algorithms perform better than the baseline with recall on positive samples of 71% and accuracy of 54%.

Original language	English (US)
Pages (from-to)	145-153
Number of pages	9
Journal	Journal of Biomedical Informatics
Volume	38
Issue number	2
DOIs	https://doi.org/10.1016/j.jbi.2004.11.016
State	Published - Apr 2005
Externally published	Yes

Keywords

Automatic classification
Congestive heart failure
Machine learning
Medical informatics
Natural language processing
Naïve Bayes
Perceptron

Access

10.1016/j.jbi.2004.11.016

OpenUrl availability

Full text

Cite this

@article{a11168c823f94266bfb5c022f6456482,

title = "Prospective recruitment of patients with congestive heart failure using an ad-hoc binary classifier",

abstract = "This paper addresses a very specific problem of identifying patients diagnosed with a specific condition for potential recruitment in a clinical trial or an epidemiological study. We present a simple machine learning method for identifying patients diagnosed with congestive heart failure and other related conditions by automatically classifying clinical notes dictated at Mayo Clinic. This method relies on an automatic classifier trained on comparable amounts of positive and negative samples of clinical notes previously categorized by human experts. The documents are represented as feature vectors, where features are a mix of demographic information as well as single words and concept mappings to MeSH and HICDA classification systems. We compare two simple and efficient classification algorithms (Na{\"i}ve Bayes and Perceptron) and a baseline term spotting method with respect to their accuracy and recall on positive samples. Depending on the test set, we find that Na{\"i}ve Bayes yields better recall on positive samples (95 vs. 86%) but worse accuracy than Perceptron (57 vs. 65%). Both algorithms perform better than the baseline with recall on positive samples of 71% and accuracy of 54%.",

keywords = "Automatic classification, Congestive heart failure, Machine learning, Medical informatics, Natural language processing, Na{\"i}ve Bayes, Perceptron",

author = "Pakhomov, {Serguei V.} and James Buntrock and Chute, {Christopher G.}",

year = "2005",

month = apr,

doi = "10.1016/j.jbi.2004.11.016",

language = "English (US)",

volume = "38",

pages = "145--153",

journal = "Journal of Biomedical Informatics",

issn = "1532-0464",

publisher = "Academic Press Inc.",

number = "2",

}

TY - JOUR

T1 - Prospective recruitment of patients with congestive heart failure using an ad-hoc binary classifier

AU - Pakhomov, Serguei V.

AU - Buntrock, James

AU - Chute, Christopher G.

PY - 2005/4

Y1 - 2005/4

N2 - This paper addresses a very specific problem of identifying patients diagnosed with a specific condition for potential recruitment in a clinical trial or an epidemiological study. We present a simple machine learning method for identifying patients diagnosed with congestive heart failure and other related conditions by automatically classifying clinical notes dictated at Mayo Clinic. This method relies on an automatic classifier trained on comparable amounts of positive and negative samples of clinical notes previously categorized by human experts. The documents are represented as feature vectors, where features are a mix of demographic information as well as single words and concept mappings to MeSH and HICDA classification systems. We compare two simple and efficient classification algorithms (Naïve Bayes and Perceptron) and a baseline term spotting method with respect to their accuracy and recall on positive samples. Depending on the test set, we find that Naïve Bayes yields better recall on positive samples (95 vs. 86%) but worse accuracy than Perceptron (57 vs. 65%). Both algorithms perform better than the baseline with recall on positive samples of 71% and accuracy of 54%.

AB - This paper addresses a very specific problem of identifying patients diagnosed with a specific condition for potential recruitment in a clinical trial or an epidemiological study. We present a simple machine learning method for identifying patients diagnosed with congestive heart failure and other related conditions by automatically classifying clinical notes dictated at Mayo Clinic. This method relies on an automatic classifier trained on comparable amounts of positive and negative samples of clinical notes previously categorized by human experts. The documents are represented as feature vectors, where features are a mix of demographic information as well as single words and concept mappings to MeSH and HICDA classification systems. We compare two simple and efficient classification algorithms (Naïve Bayes and Perceptron) and a baseline term spotting method with respect to their accuracy and recall on positive samples. Depending on the test set, we find that Naïve Bayes yields better recall on positive samples (95 vs. 86%) but worse accuracy than Perceptron (57 vs. 65%). Both algorithms perform better than the baseline with recall on positive samples of 71% and accuracy of 54%.

KW - Automatic classification

KW - Congestive heart failure

KW - Machine learning

KW - Medical informatics

KW - Natural language processing

KW - Naïve Bayes

KW - Perceptron

UR - http://www.scopus.com/inward/record.url?scp=15944378618&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=15944378618&partnerID=8YFLogxK

U2 - 10.1016/j.jbi.2004.11.016

DO - 10.1016/j.jbi.2004.11.016

M3 - Article

C2 - 15797003

AN - SCOPUS:15944378618

SN - 1532-0464

VL - 38

SP - 145

EP - 153

JO - Journal of Biomedical Informatics

JF - Journal of Biomedical Informatics

IS - 2

ER -

Prospective recruitment of patients with congestive heart failure using an ad-hoc binary classifier

Abstract

Keywords

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this