Suppr超能文献

使用Transformer模型从肺癌筛查患者的放射学报告中提取肺结节及结节特征

Extracting Pulmonary Nodules and Nodule Characteristics from Radiology Reports of Lung Cancer Screening Patients Using Transformer Models.

作者信息

Yang Shuang, Yang Xi, Lyu Tianchen, Huang James L, Chen Aokun, He Xing, Braithwaite Dejana, Mehta Hiren J, Wu Yonghui, Guo Yi, Bian Jiang

机构信息

Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL USA.

Department of Pharmaceutical Outcomes and Policy, College of Pharmacy, University of Florida, Gainesville, FL USA.

出版信息

J Healthc Inform Res. 2024 May 17;8(3):463-477. doi: 10.1007/s41666-024-00166-5. eCollection 2024 Sep.

Abstract

UNLABELLED

Pulmonary nodules and nodule characteristics are important indicators of lung nodule malignancy. However, nodule information is often documented as free text in clinical narratives such as radiology reports in electronic health record systems. Natural language processing (NLP) is the key technology to extract and standardize patient information from radiology reports into structured data elements. This study aimed to develop an NLP system using state-of-the-art transformer models to extract pulmonary nodules and associated nodule characteristics from radiology reports. We identified a cohort of 3080 patients who underwent LDCT at the University of Florida health system and collected their radiology reports. We manually annotated 394 reports as the gold standard. We explored eight pretrained transformer models from three transformer architectures including bidirectional encoder representations from transformers (BERT), robustly optimized BERT approach (RoBERTa), and A Lite BERT (ALBERT), for clinical concept extraction, relation identification, and negation detection. We examined general transformer models pretrained using general English corpora, transformer models fine-tuned using a clinical corpus, and a large clinical transformer model, GatorTron, which was trained from scratch using 90 billion words of clinical text. We compared transformer models with two baseline models including a recurrent neural network implemented using bidirectional long short-term memory with a conditional random fields layer and support vector machines. RoBERTa-mimic achieved the best 1-score of 0.9279 for nodule concept and nodule characteristics extraction. ALBERT-base and GatorTron achieved the best 1-score of 0.9737 in linking nodule characteristics to pulmonary nodules. Seven out of eight transformers achieved the best 1-score of 1.0000 for negation detection. Our end-to-end system achieved an overall 1-score of 0.8869. This study demonstrated the advantage of state-of-the-art transformer models for pulmonary nodule information extraction from radiology reports.

SUPPLEMENTARY INFORMATION

The online version contains supplementary material available at 10.1007/s41666-024-00166-5.

摘要

未标注

肺结节及结节特征是肺结节恶性程度的重要指标。然而,在电子健康记录系统中的放射学报告等临床叙述中,结节信息通常以自由文本形式记录。自然语言处理(NLP)是从放射学报告中提取并将患者信息标准化为结构化数据元素的关键技术。本研究旨在开发一个使用最先进的Transformer模型的NLP系统,以从放射学报告中提取肺结节及相关结节特征。我们确定了佛罗里达大学健康系统中3080名接受低剂量计算机断层扫描(LDCT)的患者队列,并收集了他们的放射学报告。我们手动注释了394份报告作为金标准。我们从包括双向编码器表征来自Transformer(BERT)、稳健优化的BERT方法(RoBERTa)和轻量级BERT(ALBERT)的三种Transformer架构中探索了八个预训练的Transformer模型,用于临床概念提取、关系识别和否定检测。我们研究了使用通用英语语料库预训练的通用Transformer模型、使用临床语料库微调的Transformer模型以及一个大型临床Transformer模型GatorTron,后者使用900亿字的临床文本从头开始训练。我们将Transformer模型与两个基线模型进行比较,包括使用带有条件随机场层的双向长短期记忆实现的循环神经网络和支持向量机。RoBERTa - mimic在结节概念和结节特征提取方面取得了最佳的F1分数0.9279。ALBERT - base和GatorTron在将结节特征与肺结节关联方面取得了最佳的F1分数0.9737。八个Transformer模型中有七个在否定检测方面取得了最佳的F1分数1.0000。我们的端到端系统总体F1分数为0.8869。本研究证明了最先进的Transformer模型在从放射学报告中提取肺结节信息方面的优势。

补充信息

在线版本包含可在10.1007/s41666 - 024 - 00166 - 5获取的补充材料。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aa47/11310180/74cbf46060ec/41666_2024_166_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验