Ganapathiraju Madhavi K, Subramanian Sandeep, Chaparala Srilakshmi, Karunakaran Kalyani B
Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, 5607 Baum Blvd, Suite 401, Pittsburgh, PA, 15206, USA.
Intelligent Systems Program, School of Computing and Information, University of Pittsburgh, Pittsburgh, PA, 15213, USA.
Hum Genome Var. 2020 Nov 20;7(1):40. doi: 10.1038/s41439-020-00127-5.
A palindrome in DNA is like a palindrome in language, but when read backwards, it is a complement of the forward sequence; effectively, the two halves of a sequence complement each other from its midpoint like in a double strand of DNA. Palindromes are distributed throughout the human genome and play significant roles in gene expression and regulation. Palindromic mutations are linked to many human diseases, such as neuronal disorders, mental retardation, and various cancers. In this work, we computed and analyzed the palindromic sequences in the human genome and studied their conservation in personal genomes using 1000 Genomes data. We found that ~30% of the palindromes exhibit variation, some of which are caused by rare variants. The analysis of disease/trait-associated single-nucleotide polymorphisms in palindromic regions showed that disease-associated risk variants are 14 times more likely to be present in palindromic regions than in other regions. The catalog of palindromes in the reference genome and 1000 Genomes is being made available here with details on their variations in each individual genome to serve as a resource for future and retrospective whole-genome studies identifying statistically significant palindrome variations associated with diseases or traits and their roles in disease mechanisms.
DNA中的回文序列类似于语言中的回文,但从后向前读时,它是正向序列的互补序列;实际上,序列的两半从其中点开始相互互补,就像双链DNA一样。回文序列分布于整个人类基因组中,并在基因表达和调控中发挥重要作用。回文突变与许多人类疾病相关,如神经紊乱、智力迟钝和各种癌症。在这项研究中,我们计算并分析了人类基因组中的回文序列,并利用千人基因组数据研究了它们在个人基因组中的保守性。我们发现约30%的回文序列存在变异,其中一些是由罕见变异引起的。对回文区域中与疾病/性状相关的单核苷酸多态性的分析表明,与疾病相关的风险变异出现在回文区域的可能性是其他区域的14倍。参考基因组和千人基因组中的回文序列目录在此公布,其中详细介绍了它们在每个个体基因组中的变异情况,作为未来和回顾性全基因组研究的资源,用于识别与疾病或性状相关的具有统计学意义的回文变异及其在疾病机制中的作用。