Chen Kun-Hui, Lin Yi-Hui, Wu Shawn, Shih Nai-Wen, Meng Hsing-Chen, Lin Yen-Yu, Huang Chun-Rong, Huang Jing-Wen
Department of Orthopedic Surgery, Taichung Veterans General Hospital, Taichung, Taiwan.
Department of Post-Baccalaureate Medicine, College of Medicine, National Chung Hsing University, Taichung, Taiwan.
Sci Data. 2025 Aug 23;12(1):1475. doi: 10.1038/s41597-025-05742-x.
Low-dose computed tomography (LDCT) is the most effective tools for early detection of lung cancer. With advancements in artificial intelligence, various Computer-Aided Diagnosis (CAD) systems are now supported in clinical practice. For radiologists dealing with a huge volume of CT scans, CAD systems are helpful. However, the development of these systems depends on precisely annotated datasets, which are currently limited. Although several lung imaging datasets exist, there is only few of publicly available datasets with segmentation annotations on LDCT images. To address this problem, we developed a dataset based on NLST LDCT images with pixel-level annotations of lung lesions. The dataset includes LDCT scans from 605 patients and 715 annotated lesions, including 662 lung tumors and 53 lung nodules. Lesion volumes range from 0.03 cm to 372.21 cm, with 500 lesions smaller than 5 cm, mostly located in the right upper lung. A 2D U-Net model trained on the dataset achieved a 0.95 IoU on training dataset. This dataset enhances the diversity and usability of lung cancer annotation resources.
低剂量计算机断层扫描(LDCT)是早期检测肺癌最有效的工具。随着人工智能的发展,目前临床实践中支持各种计算机辅助诊断(CAD)系统。对于处理大量CT扫描的放射科医生来说,CAD系统很有帮助。然而,这些系统的开发依赖于精确标注的数据集,而目前此类数据集有限。尽管存在几个肺部影像数据集,但只有少数公开可用的LDCT图像分割标注数据集。为了解决这个问题,我们基于NLST LDCT图像开发了一个具有肺部病变像素级标注的数据集。该数据集包括来自605名患者的LDCT扫描和715个标注病变,其中包括662个肺肿瘤和53个肺结节。病变体积范围从0.03立方厘米到372.21立方厘米,500个病变小于5立方厘米,大多位于右上肺。在该数据集上训练的二维U-Net模型在训练数据集上实现了0.95的交并比。这个数据集增强了肺癌标注资源的多样性和可用性。