Comparing NLP Systems to Extract Entities of Eligibility Criteria in Dietary Supplements Clinical Trials Using NLP-ADAPT

Anusha Bompelli; Greg Silverman; Raymond Finzel; Jake Vasilakes; Benjamin Knoll; Serguei Pakhomov; Rui Zhang

doi:10.1007/978-3-030-59137-3_7

Comparing NLP Systems to Extract Entities of Eligibility Criteria in Dietary Supplements Clinical Trials Using NLP-ADAPT

Anusha Bompelli, Greg Silverman, Raymond Finzel, Jake Vasilakes, Benjamin Knoll, Serguei Pakhomov, Rui Zhang

Pharmaceutical Care and Health Systems

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

6 Scopus citations

Abstract

Natural Language Processing (NLP) techniques have been used extensively to extract concepts from unstructured clinical trial eligibility criteria. Recruiting patients whose information in Electronic Health Records matches clinical trial eligibility criteria can potentially facilitate and accelerate the clinical trial recruitment process. However, a significant obstacle is identifying an efficient Named Entity Recognition (NER) system to parse the clinical trial eligibility criteria. In this study, we used NLP-ADAPT (Artifact Discovery and Preparation Toolkit) to compare existing biomedical NLP systems (BiomedICUS, CLAMP, cTAKES and MetaMap) and their Boolean ensemble to identify entities of the eligibility criteria of 150 randomly selected Dietary Supplement (DS) clinical trials. We created a custom mapping of the gold standard annotated entities to UMLS semantic types to align with annotations from each system. All systems in NLP-ADAPT used their default pipelines to extract entities based on our custom mappings. The systems performed reasonably well in extracting UMLS concepts belonging to the semantic types Disorders and Chemicals and Drugs. Among all systems, cTAKES was the highest performing system for Chemicals and Drugs and Disorders semantic groups and BioMedICUS was the highest performing system for Procedures, Living Beings, Concepts and Ideas, and Devices. Whereas, the Boolean ensemble outperformed individual systems. This study sets a baseline that can be potentially improved with modifications to the NLP-ADAPT pipeline.

Original language	English (US)
Title of host publication	Artificial Intelligence in Medicine - 18th International Conference on Artificial Intelligence in Medicine, AIME 2020, Proceedings
Editors	Martin Michalowski, Robert Moskovitch
Publisher	Springer Science and Business Media Deutschland GmbH
Pages	67-77
Number of pages	11
ISBN (Print)	9783030591366
DOIs	https://doi.org/10.1007/978-3-030-59137-3_7
State	Published - 2020
Event	18th International Conference on Artificial Intelligence in Medicine, AIME 2020 - Minneapolis, United States Duration: Aug 25 2020 → Aug 28 2020

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	12299 LNAI
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Conference

Conference	18th International Conference on Artificial Intelligence in Medicine, AIME 2020
Country/Territory	United States
City	Minneapolis
Period	8/25/20 → 8/28/20

Bibliographical note

Funding Information:
Acknowledgements. This work was partially supported by the NIH’s National Center for Complementary and Integrative Health and the Office of Dietary Supplements under grant number R01AT009457 (Zhang); and supported by the National Center for Advancing Translational Sciences under grant number UL1TR002494 and U01TR002062.

Publisher Copyright:
© 2020, Springer Nature Switzerland AG.

Keywords

Clinical trial eligibility
Named Entity Recognition
Natural Language Processing

Access

10.1007/978-3-030-59137-3_7

OpenUrl availability

Full text

Cite this

Bompelli, A., Silverman, G., Finzel, R., Vasilakes, J., Knoll, B., Pakhomov, S., & Zhang, R. (2020). Comparing NLP Systems to Extract Entities of Eligibility Criteria in Dietary Supplements Clinical Trials Using NLP-ADAPT. In M. Michalowski, & R. Moskovitch (Eds.), Artificial Intelligence in Medicine - 18th International Conference on Artificial Intelligence in Medicine, AIME 2020, Proceedings (pp. 67-77). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 12299 LNAI). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-59137-3_7

Comparing NLP Systems to Extract Entities of Eligibility Criteria in Dietary Supplements Clinical Trials Using NLP-ADAPT. / Bompelli, Anusha; Silverman, Greg; Finzel, Raymond et al.
Artificial Intelligence in Medicine - 18th International Conference on Artificial Intelligence in Medicine, AIME 2020, Proceedings. ed. / Martin Michalowski; Robert Moskovitch. Springer Science and Business Media Deutschland GmbH, 2020. p. 67-77 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 12299 LNAI).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Bompelli, A, Silverman, G, Finzel, R, Vasilakes, J, Knoll, B, Pakhomov, S & Zhang, R 2020, Comparing NLP Systems to Extract Entities of Eligibility Criteria in Dietary Supplements Clinical Trials Using NLP-ADAPT. in M Michalowski & R Moskovitch (eds), Artificial Intelligence in Medicine - 18th International Conference on Artificial Intelligence in Medicine, AIME 2020, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 12299 LNAI, Springer Science and Business Media Deutschland GmbH, pp. 67-77, 18th International Conference on Artificial Intelligence in Medicine, AIME 2020, Minneapolis, United States, 8/25/20. https://doi.org/10.1007/978-3-030-59137-3_7

Bompelli A, Silverman G, Finzel R, Vasilakes J, Knoll B, Pakhomov S et al. Comparing NLP Systems to Extract Entities of Eligibility Criteria in Dietary Supplements Clinical Trials Using NLP-ADAPT. In Michalowski M, Moskovitch R, editors, Artificial Intelligence in Medicine - 18th International Conference on Artificial Intelligence in Medicine, AIME 2020, Proceedings. Springer Science and Business Media Deutschland GmbH. 2020. p. 67-77. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-030-59137-3_7

Bompelli, Anusha ; Silverman, Greg ; Finzel, Raymond et al. / Comparing NLP Systems to Extract Entities of Eligibility Criteria in Dietary Supplements Clinical Trials Using NLP-ADAPT. Artificial Intelligence in Medicine - 18th International Conference on Artificial Intelligence in Medicine, AIME 2020, Proceedings. editor / Martin Michalowski ; Robert Moskovitch. Springer Science and Business Media Deutschland GmbH, 2020. pp. 67-77 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{718b50338004401d9a2e10a64f8f2def,

title = "Comparing NLP Systems to Extract Entities of Eligibility Criteria in Dietary Supplements Clinical Trials Using NLP-ADAPT",

abstract = "Natural Language Processing (NLP) techniques have been used extensively to extract concepts from unstructured clinical trial eligibility criteria. Recruiting patients whose information in Electronic Health Records matches clinical trial eligibility criteria can potentially facilitate and accelerate the clinical trial recruitment process. However, a significant obstacle is identifying an efficient Named Entity Recognition (NER) system to parse the clinical trial eligibility criteria. In this study, we used NLP-ADAPT (Artifact Discovery and Preparation Toolkit) to compare existing biomedical NLP systems (BiomedICUS, CLAMP, cTAKES and MetaMap) and their Boolean ensemble to identify entities of the eligibility criteria of 150 randomly selected Dietary Supplement (DS) clinical trials. We created a custom mapping of the gold standard annotated entities to UMLS semantic types to align with annotations from each system. All systems in NLP-ADAPT used their default pipelines to extract entities based on our custom mappings. The systems performed reasonably well in extracting UMLS concepts belonging to the semantic types Disorders and Chemicals and Drugs. Among all systems, cTAKES was the highest performing system for Chemicals and Drugs and Disorders semantic groups and BioMedICUS was the highest performing system for Procedures, Living Beings, Concepts and Ideas, and Devices. Whereas, the Boolean ensemble outperformed individual systems. This study sets a baseline that can be potentially improved with modifications to the NLP-ADAPT pipeline.",

keywords = "Clinical trial eligibility, Named Entity Recognition, Natural Language Processing",

author = "Anusha Bompelli and Greg Silverman and Raymond Finzel and Jake Vasilakes and Benjamin Knoll and Serguei Pakhomov and Rui Zhang",

note = "Funding Information: Acknowledgements. This work was partially supported by the NIH{\textquoteright}s National Center for Complementary and Integrative Health and the Office of Dietary Supplements under grant number R01AT009457 (Zhang); and supported by the National Center for Advancing Translational Sciences under grant number UL1TR002494 and U01TR002062. Publisher Copyright: {\textcopyright} 2020, Springer Nature Switzerland AG.; 18th International Conference on Artificial Intelligence in Medicine, AIME 2020 ; Conference date: 25-08-2020 Through 28-08-2020",

year = "2020",

doi = "10.1007/978-3-030-59137-3_7",

language = "English (US)",

isbn = "9783030591366",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Science and Business Media Deutschland GmbH",

pages = "67--77",

editor = "Martin Michalowski and Robert Moskovitch",

booktitle = "Artificial Intelligence in Medicine - 18th International Conference on Artificial Intelligence in Medicine, AIME 2020, Proceedings",

address = "Germany",

}

TY - GEN

T1 - Comparing NLP Systems to Extract Entities of Eligibility Criteria in Dietary Supplements Clinical Trials Using NLP-ADAPT

AU - Bompelli, Anusha

AU - Silverman, Greg

AU - Finzel, Raymond

AU - Vasilakes, Jake

AU - Knoll, Benjamin

AU - Pakhomov, Serguei

AU - Zhang, Rui

N1 - Funding Information: Acknowledgements. This work was partially supported by the NIH’s National Center for Complementary and Integrative Health and the Office of Dietary Supplements under grant number R01AT009457 (Zhang); and supported by the National Center for Advancing Translational Sciences under grant number UL1TR002494 and U01TR002062. Publisher Copyright: © 2020, Springer Nature Switzerland AG.

PY - 2020

Y1 - 2020

N2 - Natural Language Processing (NLP) techniques have been used extensively to extract concepts from unstructured clinical trial eligibility criteria. Recruiting patients whose information in Electronic Health Records matches clinical trial eligibility criteria can potentially facilitate and accelerate the clinical trial recruitment process. However, a significant obstacle is identifying an efficient Named Entity Recognition (NER) system to parse the clinical trial eligibility criteria. In this study, we used NLP-ADAPT (Artifact Discovery and Preparation Toolkit) to compare existing biomedical NLP systems (BiomedICUS, CLAMP, cTAKES and MetaMap) and their Boolean ensemble to identify entities of the eligibility criteria of 150 randomly selected Dietary Supplement (DS) clinical trials. We created a custom mapping of the gold standard annotated entities to UMLS semantic types to align with annotations from each system. All systems in NLP-ADAPT used their default pipelines to extract entities based on our custom mappings. The systems performed reasonably well in extracting UMLS concepts belonging to the semantic types Disorders and Chemicals and Drugs. Among all systems, cTAKES was the highest performing system for Chemicals and Drugs and Disorders semantic groups and BioMedICUS was the highest performing system for Procedures, Living Beings, Concepts and Ideas, and Devices. Whereas, the Boolean ensemble outperformed individual systems. This study sets a baseline that can be potentially improved with modifications to the NLP-ADAPT pipeline.

AB - Natural Language Processing (NLP) techniques have been used extensively to extract concepts from unstructured clinical trial eligibility criteria. Recruiting patients whose information in Electronic Health Records matches clinical trial eligibility criteria can potentially facilitate and accelerate the clinical trial recruitment process. However, a significant obstacle is identifying an efficient Named Entity Recognition (NER) system to parse the clinical trial eligibility criteria. In this study, we used NLP-ADAPT (Artifact Discovery and Preparation Toolkit) to compare existing biomedical NLP systems (BiomedICUS, CLAMP, cTAKES and MetaMap) and their Boolean ensemble to identify entities of the eligibility criteria of 150 randomly selected Dietary Supplement (DS) clinical trials. We created a custom mapping of the gold standard annotated entities to UMLS semantic types to align with annotations from each system. All systems in NLP-ADAPT used their default pipelines to extract entities based on our custom mappings. The systems performed reasonably well in extracting UMLS concepts belonging to the semantic types Disorders and Chemicals and Drugs. Among all systems, cTAKES was the highest performing system for Chemicals and Drugs and Disorders semantic groups and BioMedICUS was the highest performing system for Procedures, Living Beings, Concepts and Ideas, and Devices. Whereas, the Boolean ensemble outperformed individual systems. This study sets a baseline that can be potentially improved with modifications to the NLP-ADAPT pipeline.

KW - Clinical trial eligibility

KW - Named Entity Recognition

KW - Natural Language Processing

UR - http://www.scopus.com/inward/record.url?scp=85092241043&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85092241043&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-59137-3_7

DO - 10.1007/978-3-030-59137-3_7

M3 - Conference contribution

AN - SCOPUS:85092241043

SN - 9783030591366

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 67

EP - 77

BT - Artificial Intelligence in Medicine - 18th International Conference on Artificial Intelligence in Medicine, AIME 2020, Proceedings

A2 - Michalowski, Martin

A2 - Moskovitch, Robert

PB - Springer Science and Business Media Deutschland GmbH

T2 - 18th International Conference on Artificial Intelligence in Medicine, AIME 2020

Y2 - 25 August 2020 through 28 August 2020

ER -