McEvoy Christopher R E, van Helden Paul D, Warren Robin M, Gey van Pittius Nicolaas C
DST/NRF Centre of Excellence for Biomedical Tuberculosis Research/MRC Centre for Molecular and Cellular Biology, Division of Molecular Biology and Human Genetics, Faculty of Health Sciences, Stellenbosch University, Tygerberg, South Africa.
BMC Evol Biol. 2009 Sep 21;9:237. doi: 10.1186/1471-2148-9-237.
PPE38 (Rv2352c) is a member of the large PPE gene family of Mycobacterium tuberculosis and related mycobacteria. The function of PPE proteins is unknown but evidence suggests that many are cell-surface associated and recognised by the host immune system. Previous studies targeting other PPE gene members suggest that some display high levels of polymorphism and it is thought that this might represent a means of providing antigenic variation. We have analysed the genetic variability of the PPE38 genomic region on a cohort of M. tuberculosis clinical isolates representing all of the major phylogenetic lineages, along with the ancestral M. tuberculosis complex (MTBC) member M. canettii, and supplemented this with analysis of publicly available whole genome sequences representing additional M. tuberculosis clinical isolates, other MTBC members and non tuberculous mycobacteria (NTM). Where possible we have extended this analysis to include the adjacent plcABC and PPE39/40 genomic regions.
We show that the ancestral MTBC PPE38 region comprises 2 homologous PPE genes (PPE38 and PPE71), separated by 2 esat-6 (esx)-like genes and that this structure derives from an esx/esx/PPE duplication in the common ancestor of M. tuberculosis and M. marinum. We also demonstrate that this region of the genome is hypervariable due to frequent IS6110 integration, IS6110-associated recombination, and homologous recombination and gene conversion events between PPE38 and PPE71. These mutations result in combinations of gene deletion, gene truncation and gene disruption in the majority of clinical isolates. These mutations were generally found to be IS6110 strain lineage-specific, although examples of additional within-lineage and even within-cluster mutations were observed. Furthermore, we provide evidence that the published M. tuberculosis H37Rv whole genome sequence is inaccurate regarding this region.
Our results show that this antigen-encoding region of the M. tuberculosis genome is hypervariable. The observation that numerous different mutations have become fixed within specific lineages demonstrates that this genomic region is undergoing rapid molecular evolution and that further lineage-specific evolutionary expansion and diversification has occurred subsequent to the lineage-defining mutational events. We predict that functional loss of these genes could aid immune evasion. Finally, we also show that the PPE38 region of the published M. tuberculosis H37Rv whole genome sequence is not representative of the ATCC H37Rv reference strain.
PPE38(Rv2352c)是结核分枝杆菌及相关分枝杆菌中大型PPE基因家族的成员。PPE蛋白的功能尚不清楚,但有证据表明许多PPE蛋白与细胞表面相关并能被宿主免疫系统识别。以往针对其他PPE基因成员的研究表明,其中一些显示出高度多态性,人们认为这可能是提供抗原变异的一种方式。我们分析了一组代表所有主要系统发育谱系的结核分枝杆菌临床分离株以及结核分枝杆菌复合群(MTBC)的祖先成员卡内蒂分枝杆菌中PPE38基因组区域的遗传变异性,并通过分析代表其他结核分枝杆菌临床分离株、其他MTBC成员和非结核分枝杆菌(NTM)的公开全基因组序列对其进行补充。在可能的情况下,我们将这一分析扩展到包括相邻的plcABC和PPE39/40基因组区域。
我们发现,MTBC的祖先PPE38区域包含2个同源PPE基因(PPE38和PPE71),由2个esat - 6(esx)样基因隔开,且这种结构源自结核分枝杆菌和海分枝杆菌共同祖先中的esx/esx/PPE重复。我们还证明,由于IS6110频繁整合、IS6110相关重组以及PPE38和PPE71之间的同源重组和基因转换事件,该基因组区域具有高度变异性。这些突变导致大多数临床分离株中出现基因缺失、基因截断和基因破坏的组合。这些突变通常被发现是IS6110菌株谱系特异性的,不过也观察到了谱系内甚至簇内其他突变的例子。此外,我们提供证据表明,已发表的结核分枝杆菌H37Rv全基因组序列在该区域是不准确的。
我们的结果表明,结核分枝杆菌基因组的这一抗原编码区域具有高度变异性。众多不同突变在特定谱系中固定下来这一观察结果表明,该基因组区域正在经历快速分子进化,并且在定义谱系的突变事件之后发生了进一步的谱系特异性进化扩展和多样化。我们预测这些基因的功能丧失可能有助于免疫逃逸。最后,我们还表明,已发表的结核分枝杆菌H37Rv全基因组序列中的PPE38区域并不代表ATCC H37Rv参考菌株。