Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA; Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL, USA.
Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA.
J Biomed Inform. 2023 Jun;142:104370. doi: 10.1016/j.jbi.2023.104370. Epub 2023 Apr 24.
To develop a natural language processing (NLP) system to extract medications and contextual information that help understand drug changes. This project is part of the 2022 n2c2 challenge.
We developed NLP systems for medication mention extraction, event classification (indicating medication changes discussed or not), and context classification to classify medication changes context into 5 orthogonal dimensions related to drug changes. We explored 6 state-of-the-art pretrained transformer models for the three subtasks, including GatorTron, a large language model pretrained using > 90 billion words of text (including > 80 billion words from > 290 million clinical notes identified at the University of Florida Health). We evaluated our NLP systems using annotated data and evaluation scripts provided by the 2022 n2c2 organizers.
Our GatorTron models achieved the best F1-scores of 0.9828 for medication extraction (ranked 3rd), 0.9379 for event classification (ranked 2nd), and the best micro-average accuracy of 0.9126 for context classification. GatorTron outperformed existing transformer models pretrained using smaller general English text and clinical text corpora, indicating the advantage of large language models.
This study demonstrated the advantage of using large transformer models for contextual medication information extraction from clinical narratives.
开发自然语言处理 (NLP) 系统,以提取有助于理解药物变化的药物和上下文信息。本项目是 2022 年 n2c2 挑战赛的一部分。
我们开发了用于药物提及提取、事件分类(指示讨论或不讨论药物变化)以及上下文分类的 NLP 系统,将药物变化的上下文分为与药物变化相关的 5 个正交维度。我们探索了 6 种最先进的预训练转换器模型,用于三个子任务,包括 GatorTron,这是一种使用超过 900 亿个单词的文本(包括来自佛罗里达大学健康中心的超过 2.9 亿份临床记录中的超过 800 亿个单词)进行预训练的大型语言模型。我们使用 2022 年 n2c2 组织者提供的注释数据和评估脚本评估了我们的 NLP 系统。
我们的 GatorTron 模型在药物提取方面的最佳 F1 得分为 0.9828(排名第三),在事件分类方面的最佳 F1 得分为 0.9379(排名第二),在上下文分类方面的最佳微平均准确率为 0.9126。GatorTron 优于使用较小的通用英语文本和临床文本语料库预训练的现有转换器模型,这表明了大型语言模型的优势。
本研究表明,使用大型转换器模型从临床叙述中提取上下文药物信息具有优势。