Murray Gemma G R, Wang Fang, Harrison Ewan M, Paterson Gavin K, Mather Alison E, Harris Simon R, Holmes Mark A, Rambaut Andrew, Welch John J
Department of Genetics University of Cambridge Downing Street Cambridge CB2 3EH UK.
Department of Veterinary Medicine University of Cambridge Madingley Road Cambridge CB3 0ES UK.
Methods Ecol Evol. 2016 Jan;7(1):80-89. doi: 10.1111/2041-210X.12466. Epub 2015 Sep 22.
'Dated-tip' methods of molecular dating use DNA sequences sampled at different times, to estimate the age of their most recent common ancestor. Several tests of 'temporal signal' are available to determine whether data sets are suitable for such analysis. However, it remains unclear whether these tests are reliable.We investigate the performance of several tests of temporal signal, including some recently suggested modifications. We use simulated data (where the true evolutionary history is known), and whole genomes of methicillin-resistant (to show how particular problems arise with real-world data sets).We show that all of the standard tests of temporal signal are seriously misleading for data where temporal and genetic structures are confounded (i.e. where closely related sequences are more likely to have been sampled at similar times). This is not an artefact of genetic structure or tree shape , and can arise even when sequences have measurably evolved during the sampling period. More positively, we show that a 'clustered permutation' approach introduced by Duchêne . (, , 2015, 1895) can successfully correct for this artefact in all cases and introduce techniques for implementing this method with real data sets.The confounding of temporal and genetic structures may be difficult to avoid in practice, particularly for outbreaks of infectious disease, or when using ancient DNA. Therefore, we recommend the use of 'clustered permutation' for all analyses. The failure of the standard tests may explain why different methods of dating pathogen origins have reached such wildly different conclusions.
分子定年的“带时间戳提示”方法利用在不同时间采样的DNA序列来估计它们最近共同祖先的年代。有几种“时间信号”测试可用于确定数据集是否适合此类分析。然而,这些测试是否可靠仍不明确。我们研究了几种时间信号测试的性能,包括一些最近提出的改进方法。我们使用模拟数据(其真实进化历史已知)以及耐甲氧西林金黄色葡萄球菌的全基因组(以展示实际数据集中如何出现特定问题)。我们表明,对于时间结构和遗传结构相互混淆的数据(即亲缘关系较近的序列更有可能在相似时间被采样),所有标准的时间信号测试都会产生严重误导。这并非遗传结构或树形的人为产物,即使序列在采样期间有可测量的进化,也可能出现这种情况。更积极的是,我们表明由迪谢纳引入的“聚类置换”方法(迪谢纳,2015年,第1895页)在所有情况下都能成功校正这种人为现象,并介绍了在实际数据集中实施该方法的技术。时间结构和遗传结构的混淆在实践中可能难以避免,特别是对于传染病爆发或使用古代DNA时。因此,我们建议在所有分析中使用“聚类置换”方法。标准测试的失败可能解释了为什么不同的病原体起源定年方法得出了如此大相径庭的结论。