Forensic Genetics Unit, Institute of Forensic Sciences, University of Santiago de Compostela, Spain.
CITMAga (Center for Mathematical Research and Technology of Galicia), University of Santiago de Compostela, Spain.
Forensic Sci Int Genet. 2022 Nov;61:102770. doi: 10.1016/j.fsigen.2022.102770. Epub 2022 Aug 27.
Age estimation based on epigenetic markers is a DNA intelligence tool with the potential to provide relevant information for criminal investigations, as well as to improve the inference of age-dependent physical characteristics such as male pattern baldness or hair color. Age prediction models have been developed based on different tissues, including saliva and buccal cells, which show different methylation patterns as they are composed of different cell populations. On many occasions in a criminal investigation, the origin of a sample or the proportion of tissues is not known with certainty, for example the provenance of cigarette butts, so use of combined models can provide lower prediction errors. In the present study, two tissue-specific and seven age-correlated CpG sites were selected from publicly available data from the Illumina HumanMethylation 450 BeadChip and bibliographic searches, to help build a tissue-dependent, and an age-prediction model, respectively. For the development of both models, a total of 184 samples (N = 91 saliva and N = 93 buccal cells) ranging from 21 to 86 years old were used. Validation of the models was performed using either k-fold cross-validation and an additional set of 184 samples (N = 93 saliva and N = 91 buccal cells, 21-86 years old). The tissue prediction model was developed using two CpG sites (HUNK and RUNX1) based on logistic regression that produced a correct classification rate for saliva and buccal swab samples of 88.59 % for the training set, and 83.69 % for the testing set. Despite these high success rates, a combined age prediction model was developed covering both saliva and buccal cells, using seven CpG sites (cg10501210, LHFPL4, ELOVL2, PDE4C, HOXC4, OTUD7A and EDARADD) based on multivariate quantile regression giving a median absolute error (MAE): ± 3.54 years and a correct classification rate ( %CP±PI) of 76.08 % for the training set, and an MAE of ± 3.66 years and a %CP±PI of 71.19 % for the testing set. The addition of tissue-of origin as a co-variate to the model was assessed, but no improvement was detected in age predictions. Finally, considering the limitations usually faced by forensic DNA analyses, the robustness of the model and the minimum recommended amount of input DNA for bisulfite conversion were evaluated, considering up to 10 ng of genomic DNA for reproducible results. The final multivariate quantile regression age predictor based on the models we developed has been placed in the open-access Snipper forensic classification website.
基于表观遗传标记的年龄估计是一种 DNA 智能工具,具有为犯罪调查提供相关信息的潜力,以及改善对男性型秃发或头发颜色等与年龄相关的身体特征的推断。已经基于不同的组织,包括唾液和颊细胞,开发了年龄预测模型,这些组织显示出不同的甲基化模式,因为它们由不同的细胞群体组成。在犯罪调查的许多情况下,样本的来源或组织的比例不能确定,例如香烟头的来源,因此使用组合模型可以降低预测误差。在本研究中,从 Illumina HumanMethylation 450 BeadChip 上公开提供的数据和文献检索中选择了两个组织特异性和七个与年龄相关的 CpG 位点,分别用于构建组织依赖性和年龄预测模型。为了开发这两个模型,总共使用了 184 个样本(N=91 个唾液样本和 N=93 个颊细胞样本),年龄范围为 21 至 86 岁。使用 k 折交叉验证和另外 184 个样本(N=93 个唾液样本和 N=91 个颊细胞样本,年龄 21-86 岁)对模型进行了验证。组织预测模型是使用基于逻辑回归的两个 CpG 位点(HUNK 和 RUNX1)开发的,该模型对训练集和测试集的唾液和颊拭子样本的正确分类率分别为 88.59%和 83.69%。尽管这些成功率很高,但还是开发了一个涵盖唾液和颊细胞的联合年龄预测模型,该模型使用基于多元分位数回归的七个 CpG 位点(cg10501210、LHFPL4、ELOVL2、PDE4C、HOXC4、OTUD7A 和 EDARADD),中位数绝对误差(MAE)为±3.54 岁,训练集的正确分类率(%CP±PI)为 76.08%,测试集的 MAE 为±3.66 岁,%CP±PI 为 71.19%。评估了将组织来源作为协变量添加到模型中,但未发现年龄预测的改善。最后,考虑到法医 DNA 分析通常面临的限制,评估了模型的稳健性和最低推荐用于亚硫酸氢盐转化的输入 DNA 量,考虑了多达 10ng 的基因组 DNA 以获得可重复的结果。基于我们开发的模型的最终多元分位数回归年龄预测器已放置在开放访问的 Snipper 法医分类网站上。