Mayne Benjamin, Berry Oliver, Jarman Simon
Environomics Future Science Platform, Indian Ocean Marine Research Centre Commonwealth Scientific and Industrial Research Organisation (CSIRO) Crawley Western Australia Australia.
Curtin University Perth Western Australia Australia.
Evol Appl. 2023 Jul 26;16(8):1496-1502. doi: 10.1111/eva.13582. eCollection 2023 Aug.
Animal age data are valuable for management of wildlife populations. Yet, for most species, there is no practical method for determining the age of unknown individuals. However, epigenetic clocks, a molecular-based method, are capable of age prediction by sampling specific tissue types and measuring DNA methylation levels at specific loci. Developing an epigenetic clock requires a large number of samples from animals of known ages. For most species, there are no individuals whose exact ages are known, making epigenetic clock calibration inaccurate or impossible. For many epigenetic clocks, calibration samples with inaccurate age estimates introduce a degree of error to epigenetic clock calibration. In this study, we investigated how much error in the training data set of an epigenetic clock can be tolerated before it resulted in an unacceptable increase in error for age prediction. Using four publicly available data sets, we artificially increased the training data age error by iterations of 1% and then tested the model against an independent set of known ages. A small effect size increase (Cohen's d >0.2) was detected when the error in age was higher than 22%. The effect size increased linearly with age error. This threshold was independent of sample size. Downstream applications for age data may have a more important role in deciding how much error can be tolerated for age prediction. If highly precise age estimates are required, then it may be futile to embark on the development of an epigenetic clock when there is no accurately aged calibration population to work with. However, for other problems, such as determining the relative age order of pairs of individuals, a lower-quality calibration data set may be adequate.
动物年龄数据对于野生动物种群的管理非常重要。然而,对于大多数物种来说,尚无确定未知个体年龄的实用方法。不过,表观遗传时钟作为一种基于分子的方法,能够通过对特定组织类型进行采样并测量特定基因座处的DNA甲基化水平来预测年龄。开发表观遗传时钟需要大量来自已知年龄动物的样本。对于大多数物种而言,没有确切年龄已知的个体,这使得表观遗传时钟校准不准确或无法进行。对于许多表观遗传时钟来说,年龄估计不准确的校准样本会给表观遗传时钟校准带来一定程度的误差。在本研究中,我们调查了表观遗传时钟训练数据集中的误差在导致年龄预测误差不可接受地增加之前能够容忍多少。我们使用四个公开可用的数据集,通过每次增加1%的迭代来人为增加训练数据的年龄误差,然后针对一组独立的已知年龄对模型进行测试。当年龄误差高于22%时,检测到效应量有小幅增加(科恩d>0.2)。效应量随年龄误差呈线性增加。这个阈值与样本大小无关。年龄数据的下游应用在决定年龄预测可容忍的误差量方面可能发挥更重要的作用。如果需要高精度的年龄估计,那么在没有准确年龄校准种群可供使用的情况下着手开发表观遗传时钟可能是徒劳的。然而,对于其他问题,比如确定个体对的相对年龄顺序,较低质量的校准数据集可能就足够了。