Shumate Alaina, Salzberg Steven L
Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA.
Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD 21211, USA.
Bioinformatics. 2021 Jul 19;37(12):1639-1643. doi: 10.1093/bioinformatics/btaa1016.
Improvements in DNA sequencing technology and computational methods have led to a substantial increase in the creation of high-quality genome assemblies of many species. To understand the biology of these genomes, annotation of gene features and other functional elements is essential; however, for most species, only the reference genome is well-annotated.
One strategy to annotate new or improved genome assemblies is to map or 'lift over' the genes from a previously annotated reference genome. Here, we describe Liftoff, a new genome annotation lift-over tool capable of mapping genes between two assemblies of the same or closely related species. Liftoff aligns genes from a reference genome to a target genome and finds the mapping that maximizes sequence identity while preserving the structure of each exon, transcript and gene. We show that Liftoff can accurately map 99.9% of genes between two versions of the human reference genome with an average sequence identity >99.9%. We also show that Liftoff can map genes across species by successfully lifting over 98.3% of human protein-coding genes to a chimpanzee genome assembly with 98.2% sequence identity.
Liftoff can be installed via bioconda and PyPI. In addition, the source code for Liftoff is available at https://github.com/agshumate/Liftoff.
Supplementary data are available at Bioinformatics online.
DNA测序技术和计算方法的改进使得许多物种高质量基因组组装的数量大幅增加。为了理解这些基因组的生物学特性,对基因特征和其他功能元件进行注释至关重要;然而,对于大多数物种而言,只有参考基因组得到了充分注释。
注释新的或改进的基因组组装的一种策略是将来自先前注释的参考基因组的基因进行映射或“转移”。在这里,我们描述了Liftoff,这是一种新的基因组注释转移工具,能够在同一物种或密切相关物种的两个组装之间映射基因。Liftoff将参考基因组中的基因与目标基因组进行比对,并找到在保留每个外显子、转录本和基因结构的同时最大化序列同一性的映射。我们表明,Liftoff能够在人类参考基因组的两个版本之间准确映射99.9%的基因,平均序列同一性>99.9%。我们还表明,Liftoff能够通过成功地将98.3%的人类蛋白质编码基因转移到具有98.2%序列同一性的黑猩猩基因组组装上,从而跨物种映射基因。
Liftoff可以通过bioconda和PyPI进行安装。此外,Liftoff的源代码可在https://github.com/agshumate/Liftoff上获取。
补充数据可在《生物信息学》在线版上获取。