Xiao Benyang, Zhou Yuxiang, Zhang Zhirui, Wang Xindi, Xiang Jiali, Lv Zhixin, Liao Miao, Luo Haibo, Song Feng
Department of Forensic Genetics, West China School of Basic Medical Sciences & Forensic Medicine, Sichuan University, 3-16 Renmin South Road, Chengdu, 610041, China.
BMC Genomics. 2025 May 30;26(1):546. doi: 10.1186/s12864-025-11713-8.
DNA methylation is a pivotal biomarker for age prediction. However, most studies focus on blood-derived data, with limited research on saliva, and the inability to directly analyze methylation data across diverse platforms constrains predictive accuracy.
We identified 10 age-related CpG sites in saliva (cg00481951, cg07547549, cg10501210, cg13654588, cg14361627, cg15480367, cg17110586, cg17885226, cg19671120, cg21296230) through six Illumina HumanMethylation450 BeadChip datasets and developed two multiplex SNaPshot assays. Leveraging methylation SNaPshot data from 239 saliva samples (13–69 years), we constructed an ensemble model with 17 neural network classifiers, each categorizing ages with a 17-year bin width and shifting bins by one year in subsequent classifiers. Validated by an independent testing set consisting of 44 samples (13–66 years), the model achieved a mean absolute error (MAE) of 4.39 years, outperforming some advanced linear and nonlinear models. Notably, the model also showed improved prediction performance when applied to other datasets, demonstrating its robustness and generalizability. Additionally, by incorporating dummy variables into our model, we effectively mitigated platform-specific biases, facilitating integrated multi-platform methylation data analysis for age prediction.
In this study, we identified ten age-associated CpG sites in saliva and developed an ensemble model with 17 neural network classifiers for precise age prediction. Moreover, by introducing dummy variables, our model effectively mitigates platform-dependent variations. In summary, we offered a novel framework for age prediction for saliva and cross-platform data analysis.
The online version contains supplementary material available at 10.1186/s12864-025-11713-8.
DNA甲基化是年龄预测的关键生物标志物。然而,大多数研究集中于血液来源的数据,对唾液的研究有限,并且无法直接跨不同平台分析甲基化数据限制了预测准确性。
我们通过六个Illumina HumanMethylation450 BeadChip数据集在唾液中鉴定出10个与年龄相关的CpG位点(cg00481951、cg07547549、cg10501210、cg13654588、cg14361627、cg15480367、cg17110586、cg17885226、cg19671120、cg21296230),并开发了两种多重SNaPshot检测方法。利用来自239份唾液样本(13 - 69岁)的甲基化SNaPshot数据,我们构建了一个由17个神经网络分类器组成的集成模型,每个分类器以17岁的区间宽度对年龄进行分类,并在后续分类器中将区间逐年移动。通过由44份样本(13 - 66岁)组成的独立测试集验证,该模型的平均绝对误差(MAE)为4.39岁表现优于一些先进的线性和非线性模型。值得注意的是,该模型在应用于其他数据集时也表现出改进的预测性能,证明了其稳健性和通用性。此外,通过将虚拟变量纳入我们的模型,我们有效地减轻了平台特异性偏差,便于进行用于年龄预测的集成多平台甲基化数据分析。
在本研究中,我们在唾液中鉴定出十个与年龄相关的CpG位点,并开发了一个由17个神经网络分类器组成的集成模型用于精确的年龄预测。此外,通过引入虚拟变量,我们的模型有效地减轻了平台依赖性变化。总之,我们为唾液年龄预测和跨平台数据分析提供了一个新框架。
在线版本包含可在10.1186/s12864-025-11713-8获取的补充材料。