Schmid College of Science and Technology, Chapman University, 1 University Dr, Orange, CA, 92866, USA.
Children's Health of Orange County (CHOC), Orange, CA, 92868, USA.
BMC Med Res Methodol. 2022 Jul 2;22(1):181. doi: 10.1186/s12874-022-01665-y.
Discharge medical notes written by physicians contain important information about the health condition of patients. Many deep learning algorithms have been successfully applied to extract important information from unstructured medical notes data that can entail subsequent actionable results in the medical domain. This study aims to explore the model performance of various deep learning algorithms in text classification tasks on medical notes with respect to different disease class imbalance scenarios.
In this study, we employed seven artificial intelligence models, a CNN (Convolutional Neural Network), a Transformer encoder, a pretrained BERT (Bidirectional Encoder Representations from Transformers), and four typical sequence neural networks models, namely, RNN (Recurrent Neural Network), GRU (Gated Recurrent Unit), LSTM (Long Short-Term Memory), and Bi-LSTM (Bi-directional Long Short-Term Memory) to classify the presence or absence of 16 disease conditions from patients' discharge summary notes. We analyzed this question as a composition of 16 binary separate classification problems. The model performance of the seven models on each of the 16 datasets with various levels of imbalance between classes were compared in terms of AUC-ROC (Area Under the Curve of the Receiver Operating Characteristic), AUC-PR (Area Under the Curve of Precision and Recall), F1 Score, and Balanced Accuracy as well as the training time. The model performances were also compared in combination with different word embedding approaches (GloVe, BioWordVec, and no pre-trained word embeddings).
The analyses of these 16 binary classification problems showed that the Transformer encoder model performs the best in nearly all scenarios. In addition, when the disease prevalence is close to or greater than 50%, the Convolutional Neural Network model achieved a comparable performance to the Transformer encoder, and its training time was 17.6% shorter than the second fastest model, 91.3% shorter than the Transformer encoder, and 94.7% shorter than the pre-trained BERT-Base model. The BioWordVec embeddings slightly improved the performance of the Bi-LSTM model in most disease prevalence scenarios, while the CNN model performed better without pre-trained word embeddings. In addition, the training time was significantly reduced with the GloVe embeddings for all models.
For classification tasks on medical notes, Transformer encoders are the best choice if the computation resource is not an issue. Otherwise, when the classes are relatively balanced, CNNs are a leading candidate because of their competitive performance and computational efficiency.
医生撰写的出院病历中包含有关患者健康状况的重要信息。许多深度学习算法已成功应用于从非结构化医疗记录数据中提取重要信息,这可能会带来医疗领域的后续可操作结果。本研究旨在探讨不同疾病类别不平衡情况下,各种深度学习算法在病历文本分类任务中的模型性能。
本研究采用了七种人工智能模型,包括卷积神经网络(CNN)、Transformer 编码器、预训练的 BERT(来自 Transformer 的双向编码器表示)和四种典型的序列神经网络模型,即循环神经网络(RNN)、门控循环单元(GRU)、长短期记忆网络(LSTM)和双向长短期记忆网络(Bi-LSTM),对患者出院小结中的 16 种疾病进行分类。我们将这个问题分析为 16 个二进制独立分类问题的组合。比较了七种模型在 16 个数据集的每个数据集上的性能,这些数据集的类不平衡程度不同,评估指标包括 AUC-ROC(曲线下的接收者操作特征面积)、AUC-PR(曲线下的精度和召回面积)、F1 得分、平衡准确率以及训练时间。还比较了不同词嵌入方法(GloVe、BioWordVec 和无预训练词嵌入)下的模型性能。
这些二进制分类问题的分析表明,Transformer 编码器模型在几乎所有情况下表现最好。此外,当疾病流行率接近或大于 50%时,卷积神经网络模型的性能与 Transformer 编码器相当,其训练时间比第二快的模型短 17.6%,比 Transformer 编码器短 91.3%,比预训练的 BERT-Base 模型短 94.7%。在大多数疾病流行率场景中,BioWordVec 词嵌入略微提高了 Bi-LSTM 模型的性能,而 CNN 模型在没有预训练词嵌入的情况下表现更好。此外,对于所有模型,GloVe 词嵌入都显著减少了训练时间。
对于病历的分类任务,如果计算资源不是问题,那么 Transformer 编码器是最佳选择。否则,在类别相对平衡的情况下,由于具有竞争力的性能和计算效率,CNN 是首选。