Zhao Renjia, Yuan Huangbo, Jiang Yanfeng, Liu Zhenqiu, Chen Ruilin, Wang Shuo, Lu Linyao, Yuan Ziyu, Su Zhixi, He Qiye, Xu Kelin, Zhang Tiejun, Jin Li, Lu Ming, Ye Weimin, Liu Rui, Suo Chen, Chen Xingdong
State Key Laboratory of Genetics and Development of Complex Phenotypes, ZhangjiangFudan International Innovation Center, Human Phenome Institute, Fudan University, Songhu Road 2005, Shanghai, 200433, China.
Fudan University Taizhou Institute of Health Sciences, Taizhou, China.
Biomark Res. 2025 Aug 11;13(1):101. doi: 10.1186/s40364-025-00812-z.
Early identification of high-risk individuals is crucial for optimizing cancer screening, particularly when considering expensive and invasive methods such as multi-omics technologies and endoscopic procedures. However, developing a robust, practical multi-cancer risk prediction model that integrates diverse, multi-scale data and with proper validation remains a significant challenge.
We initialized the FuSion study by recruiting 42,666 participants from Taizhou, China, with a discovery cohort (n = 16,340) and an independent validation cohort (n = 26,308) after exclusion criteria. We integrated multi-scale data from 54 blood-derived biomarkers and 26 epidemiological exposures to develop a risk prediction model for five common cancers, including lung, esophageal, liver, gastric, and colorectal cancer. Employing five supervised machine learning approaches, we used a LASSO-based feature selection strategy to identify the most informative predictors. The model was trained and internally validated in the discovery cohort, externally applied in the validation cohort, and further evaluated through a prospective clinical follow-up to assess cancer events via clinical examinations.
The final model comprising four key biomarkers along with age, sex, and smoking intensity, achieving an AUROC of 0.767 (95% CI: 0.723-0.814) for five-year risk prediction. High-risk individuals (17.19% of the cohort) accounted for 50.42% of incident cancer cases, with a 15.19-fold increased risk compared to the low-risk group. During follow-up of 2,863 high-risk subjects, 9.64% were newly diagnosed with cancer or precancerous lesions. Notably, cancer detection in the high-risk group was 5.02 times higher than in the low-risk group and 1.74 times higher than in the intermediate-risk group. In particular, the incidence of esophageal cancers in the high-risk group was 16.84 times that of the low-risk group.
This is the first population-based prospective study in a large Chinese cohort that leverage multi-scale data including biomarkers for multi-cancer risk prediction. Our effective risk stratification model not only enhances early cancer detection but also lays the foundation for the targeted application of advanced screening methods, including but not limited to multi-omics technologies and endoscopy. These findings support precision prevention strategies and the optimal allocation of healthcare resources.
早期识别高危个体对于优化癌症筛查至关重要,尤其是在考虑多组学技术和内镜检查等昂贵且有创的方法时。然而,开发一个强大、实用的多癌风险预测模型,整合多样的、多尺度的数据并进行适当验证,仍然是一项重大挑战。
我们启动了融合研究,从中国泰州招募了42,666名参与者,在排除标准后分为发现队列(n = 16,340)和独立验证队列(n = 26,308)。我们整合了来自54种血液衍生生物标志物和26种流行病学暴露的多尺度数据,以开发针对肺癌、食管癌、肝癌、胃癌和结直肠癌这五种常见癌症的风险预测模型。采用五种监督机器学习方法,我们使用基于LASSO的特征选择策略来识别最具信息量的预测因子。该模型在发现队列中进行训练和内部验证,在验证队列中进行外部应用,并通过前瞻性临床随访进一步评估,以通过临床检查评估癌症事件。
最终模型包括四个关键生物标志物以及年龄、性别和吸烟强度,在五年风险预测中实现了0.767的曲线下面积(95%置信区间:0.723 - 0.814)。高危个体(占队列的17.19%)占新发癌症病例的50.42%,与低风险组相比风险增加了15.19倍。在对2,863名高危受试者的随访中,9.64%被新诊断患有癌症或癌前病变。值得注意的是,高危组的癌症检出率比低风险组高5.02倍,比中风险组高1.74倍。特别是,高危组的食管癌发病率是低风险组的16.84倍。
这是在中国大型队列中进行的第一项基于人群的前瞻性研究,利用包括生物标志物在内的多尺度数据进行多癌风险预测。我们有效的风险分层模型不仅提高了早期癌症检测率,还为包括但不限于多组学技术和内镜检查在内的先进筛查方法的靶向应用奠定了基础。这些发现支持精准预防策略和医疗资源的优化分配。