Jiang Minyun, Cai Na, Hu Juan, Han Lei, Xu Fanwei, Zhu Baoli, Wang Boshen
School of Public Health, Nanjing Medical University, Nanjing, Jiangsu, China.
Jiangsu Province Center for Disease Prevention and Control, Institute of Occupational Disease Prevention, Nanjing, Jiangsu, China.
Front Public Health. 2025 Jan 21;12:1419361. doi: 10.3389/fpubh.2024.1419361. eCollection 2024.
In this research, we leveraged bioinformatics and machine learning to pinpoint key risk genes associated with occupational benzene exposure and to construct genomic and algorithm-based predictive risk assessment models.
We sourced GSE9569 and GSE21862 microarray data from the Gene Expression Omnibus. Utilizing R software, we performed an initial screen for differentially expressed genes (DEGs), which was followed by the enrichment analyses to elucidate the affected functions and pathways. Subsequent steps included the application of three machine learning algorithms for key gene identification, and the validation of these genes within both a cohort exposed to benzene and a benzene-exposed mice model. We then conducted a functional prediction analysis on these genes using four machine learning models, complemented by GSVA enrichment analysis.
Out of the data, 40 DEGs were identified, primarily linked to cytokine signaling, lipopolysaccharide response, and chemokine pathways. NFKB1, PHACTR1, PTGS2, and PTX3 were pinpointed as significant through machine learning. Validation confirmed substantial changes in NFKB1 and PTX3 following exposure, with PTX3 emerging as paramount, suggesting its utility as a diagnostic biomarker for benzene damage.
Risk assessment models, informed by oxidative stress markers, successfully discriminated between benzene-injured patients and controls.
在本研究中,我们利用生物信息学和机器学习来确定与职业性苯暴露相关的关键风险基因,并构建基于基因组和算法的预测风险评估模型。
我们从基因表达综合数据库获取了GSE9569和GSE21862微阵列数据。利用R软件,我们对差异表达基因(DEG)进行了初步筛选,随后进行富集分析以阐明受影响的功能和途径。后续步骤包括应用三种机器学习算法来识别关键基因,并在苯暴露队列和苯暴露小鼠模型中对这些基因进行验证。然后,我们使用四种机器学习模型对这些基因进行功能预测分析,并辅以GSVA富集分析。
从数据中鉴定出40个DEG,主要与细胞因子信号传导、脂多糖反应和趋化因子途径相关。通过机器学习确定NFKB1、PHACTR1、PTGS2和PTX3具有显著性。验证证实暴露后NFKB1和PTX3发生了显著变化,其中PTX3最为突出,表明其可作为苯损伤的诊断生物标志物。
以氧化应激标志物为依据的风险评估模型成功区分了苯损伤患者和对照组。