Schell Tilman, Greve Carola, Podsiadlowski Lars
LOEWE Centre for Translational Biodiversity Genomics, Senckenberganlage 25, 60325, Frankfurt, Germany.
Senckenberg Research Institute, Senckenberganlage 25, 60325, Frankfurt, Germany.
Front Zool. 2025 Apr 17;22(1):7. doi: 10.1186/s12983-025-00561-7.
Reference genome assemblies are the basis for comprehensive genomic analyses and comparisons. Due to declining sequencing costs and growing computational power, genome projects are now feasible in smaller labs. De novo genome sequencing for non-model or emerging model organisms requires knowledge about genome size and techniques for extracting high molecular weight DNA. Next to quality, the amount of DNA obtained from single individuals is crucial, especially, when dealing with small organisms. While long-read sequencing technologies are the methods of choice for creating high quality genome assemblies, pure short-read assemblies might bear most of the coding parts of a genome but are usually much more fragmented and do not well resolve repeat elements or structural variants. Several genome initiatives produce more and more non-model organism genomes and provide rules for standards in genome sequencing and assembly. However, sometimes the organism of choice is not part of such an initiative or does not meet its standards. Therefore, if the scientific question can be answered with a genome of low contiguity in intergenic parts, missing the high standards of chromosome scale assembly should not prevent publication. This review describes how to set up an animal genome sequencing project in the lab, how to estimate costs and resources, and how to deal with suboptimal conditions. Thus, we aim to suggest optimal strategies for genome sequencing that fulfil the needs according to specific research questions, e.g. "How are species related to each other based on whole genomes?" (phylogenomics), "How do genomes of populations within a species differ?" (population genomics), "Are differences between populations relevant for conservation?" (conservation genomics), "Which selection pressure is acting on certain genes?" (identification of genes under selection), "Did repeats expand or contract recently?" (repeat dynamics).
参考基因组组装是全面基因组分析和比较的基础。由于测序成本下降和计算能力增强,基因组计划现在在较小的实验室中也可行。对非模式生物或新兴模式生物进行从头基因组测序需要了解基因组大小以及提取高分子量DNA的技术。除了质量外,从单个个体获得的DNA量也至关重要,特别是在处理小型生物时。虽然长读长测序技术是创建高质量基因组组装的首选方法,但纯短读长组装可能包含基因组的大部分编码部分,但通常更加碎片化,并且不能很好地解析重复元件或结构变异。一些基因组计划产生了越来越多的非模式生物基因组,并为基因组测序和组装的标准提供了规则。然而,有时所选择的生物并不属于这样的计划,或者不符合其标准。因此,如果科学问题可以通过基因间区域低连续性的基因组来回答,那么达不到染色体水平组装的高标准不应妨碍发表。这篇综述描述了如何在实验室中开展动物基因组测序项目,如何估计成本和资源,以及如何应对不理想的条件。因此,我们旨在根据特定研究问题,如“基于全基因组,物种之间如何相互关联?”(系统发育基因组学)、“一个物种内不同种群的基因组有何差异?”(群体基因组学)、“种群间的差异与保护是否相关?”(保护基因组学)、“哪些选择压力作用于某些基因?”(选择下基因的鉴定)、“重复序列最近是扩张还是收缩了?”(重复序列动态),提出满足需求的基因组测序最佳策略。