Daakour Sarah, Nelson David R, Fu Weiqi, Jaiswal Ashish, Dohai Bushra, Alzahmi Amnah Salem, Koussa Joseph, Huang Xiaoluo, Shen Yue, Twizere Jean-Claude, Salehi-Ashtiani Kourosh
Center for Genomics and Systems Biology (CGSB), New York University-Abu Dhabi, Abu Dhabi P.O. Box 129188, United Arab Emirates.
Division of Science and Math, New York University-Abu Dhabi, Abu Dhabi P.O. Box 129188, United Arab Emirates.
Microorganisms. 2024 Aug 20;12(8):1720. doi: 10.3390/microorganisms12081720.
, a cyanobacteria genus of the smallest and most abundant oceanic phototrophs, encompasses ecotype strains adapted to high-light (HL) and low-light (LL) niches. To elucidate the adaptive evolution of this genus, we analyzed 40 ORFeomes, including two cornerstone strains, MED4 and NATL1A. Employing deep learning with robust statistical methods, we detected new protein family distributions in the strains and identified key genes differentiating the HL and LL strains. The HL strains harbor genes (ABC-2 transporters) related to stress resistance, such as DNA repair and RNA processing, while the LL strains exhibit unique chlorophyll adaptations (ion transport proteins, HEAT repeats). Additionally, we report the finding of variable, depth-dependent endogenous viral elements in the 40 strains. To generate biological resources to experimentally study the HL and LL adaptations, we constructed the ORFeomes of two representative strains, MED4 and NATL1A synthetically, covering 99% of the annotated protein-coding sequences of the two species, totaling 3976 cloned, sequence-verified open reading frames (ORFs). These comparative genomic analyses, paired with MED4 and NATL1A ORFeomes, will facilitate future genotype-to-phenotype mappings and the systems biology exploration of ecology.
是最小且最丰富的海洋光合生物蓝藻属,包含适应高光(HL)和低光(LL)生态位的生态型菌株。为了阐明该属的适应性进化,我们分析了40个开放阅读框组,包括两个基石菌株MED4和NATL1A。我们采用深度学习和强大的统计方法,检测了菌株中新的蛋白质家族分布,并鉴定了区分HL和LL菌株的关键基因。HL菌株含有与抗逆性相关的基因(ABC - 2转运蛋白),如DNA修复和RNA加工,而LL菌株表现出独特的叶绿素适应性(离子转运蛋白、HEAT重复序列)。此外,我们报告了在40个菌株中发现可变的、深度依赖的内源性病毒元件。为了生成用于实验研究HL和LL适应性的生物资源,我们综合构建了两个代表性菌株MED4和NATL1A的开放阅读框组,覆盖了这两个物种99%的注释蛋白质编码序列,总共3976个克隆的、经序列验证的开放阅读框(ORF)。这些比较基因组分析,与MED4和NATL1A开放阅读框组相结合,将有助于未来从基因型到表型的映射以及生态学的系统生物学探索。