Institute of Applied Simulation, School of Life Sciences and Facility Management, Zürich University of Applied Sciences (ZHAW) , Wädenswil , Switzerland.
Department of Biosystems Science and Engineering, ETH Zürich , Basel , Switzerland ; Department of Computer Science, ETH Zürich , Zürich , Switzerland.
Front Bioeng Biotechnol. 2015 Mar 17;3:31. doi: 10.3389/fbioe.2015.00031. eCollection 2015.
Tandem repeats (TRs) are frequently observed in genomes across all domains of life. Evidence suggests that some TRs are crucial for proteins with fundamental biological functions and can be associated with virulence, resistance, and infectious/neurodegenerative diseases. Genome-scale systematic studies of TRs have the potential to unveil core mechanisms governing TR evolution and TR roles in shaping genomes. However, TR-related studies are often non-trivial due to heterogeneous and sometimes fast evolving TR regions. In this review, we discuss these intricacies and their consequences. We present our recent contributions to computational and statistical approaches for TR significance testing, sequence profile-based TR annotation, TR-aware sequence alignment, phylogenetic analyses of TR unit number and order, and TR benchmarks. Importantly, all these methods explicitly rely on the evolutionary definition of a tandem repeat as a sequence of adjacent repeat units stemming from a common ancestor. The discussed work has a focus on protein TRs, yet is generally applicable to nucleic acid TRs, sharing similar features.
串联重复(TRs)在所有生命领域的基因组中都经常被观察到。有证据表明,一些 TR 对具有基本生物学功能的蛋白质至关重要,并且可能与毒力、抗性以及传染性/神经退行性疾病有关。对 TR 进行大规模的系统研究有可能揭示控制 TR 进化的核心机制以及 TR 在塑造基因组方面的作用。然而,由于异质且有时快速进化的 TR 区域,TR 相关的研究往往并不简单。在这篇综述中,我们讨论了这些复杂性及其后果。我们介绍了我们最近在用于 TR 显著性检验的计算和统计方法、基于序列特征的 TR 注释、TR 感知的序列比对、TR 单元数和顺序的系统发育分析以及 TR 基准方面的贡献。重要的是,所有这些方法都明确依赖于串联重复的进化定义,即将源自共同祖先的相邻重复单元序列作为串联重复。所讨论的工作主要集中在蛋白质 TR 上,但通常也适用于具有相似特征的核酸 TR。