TY - JOUR
T1 - Prospective recruitment of patients with congestive heart failure using an ad-hoc binary classifier
AU - Pakhomov, Serguei V.
AU - Buntrock, James
AU - Chute, Christopher G.
PY - 2005/4
Y1 - 2005/4
N2 - This paper addresses a very specific problem of identifying patients diagnosed with a specific condition for potential recruitment in a clinical trial or an epidemiological study. We present a simple machine learning method for identifying patients diagnosed with congestive heart failure and other related conditions by automatically classifying clinical notes dictated at Mayo Clinic. This method relies on an automatic classifier trained on comparable amounts of positive and negative samples of clinical notes previously categorized by human experts. The documents are represented as feature vectors, where features are a mix of demographic information as well as single words and concept mappings to MeSH and HICDA classification systems. We compare two simple and efficient classification algorithms (Naïve Bayes and Perceptron) and a baseline term spotting method with respect to their accuracy and recall on positive samples. Depending on the test set, we find that Naïve Bayes yields better recall on positive samples (95 vs. 86%) but worse accuracy than Perceptron (57 vs. 65%). Both algorithms perform better than the baseline with recall on positive samples of 71% and accuracy of 54%.
AB - This paper addresses a very specific problem of identifying patients diagnosed with a specific condition for potential recruitment in a clinical trial or an epidemiological study. We present a simple machine learning method for identifying patients diagnosed with congestive heart failure and other related conditions by automatically classifying clinical notes dictated at Mayo Clinic. This method relies on an automatic classifier trained on comparable amounts of positive and negative samples of clinical notes previously categorized by human experts. The documents are represented as feature vectors, where features are a mix of demographic information as well as single words and concept mappings to MeSH and HICDA classification systems. We compare two simple and efficient classification algorithms (Naïve Bayes and Perceptron) and a baseline term spotting method with respect to their accuracy and recall on positive samples. Depending on the test set, we find that Naïve Bayes yields better recall on positive samples (95 vs. 86%) but worse accuracy than Perceptron (57 vs. 65%). Both algorithms perform better than the baseline with recall on positive samples of 71% and accuracy of 54%.
KW - Automatic classification
KW - Congestive heart failure
KW - Machine learning
KW - Medical informatics
KW - Natural language processing
KW - Naïve Bayes
KW - Perceptron
UR - http://www.scopus.com/inward/record.url?scp=15944378618&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=15944378618&partnerID=8YFLogxK
U2 - 10.1016/j.jbi.2004.11.016
DO - 10.1016/j.jbi.2004.11.016
M3 - Article
C2 - 15797003
AN - SCOPUS:15944378618
SN - 1532-0464
VL - 38
SP - 145
EP - 153
JO - Journal of Biomedical Informatics
JF - Journal of Biomedical Informatics
IS - 2
ER -