Merkin Institute of Transformative Technologies in Healthcare, Broad Institute of Harvard and MIT, Cambridge, MA, USA.
Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA, USA.
Nat Biotechnol. 2021 Nov;39(11):1414-1425. doi: 10.1038/s41587-021-00938-z. Epub 2021 Jun 28.
Programmable C•G-to-G•C base editors (CGBEs) have broad scientific and therapeutic potential, but their editing outcomes have proved difficult to predict and their editing efficiency and product purity are often low. We describe a suite of engineered CGBEs paired with machine learning models to enable efficient, high-purity C•G-to-G•C base editing. We performed a CRISPR interference (CRISPRi) screen targeting DNA repair genes to identify factors that affect C•G-to-G•C editing outcomes and used these insights to develop CGBEs with diverse editing profiles. We characterized ten promising CGBEs on a library of 10,638 genomically integrated target sites in mammalian cells and trained machine learning models that accurately predict the purity and yield of editing outcomes (R = 0.90) using these data. These CGBEs enable correction to the wild-type coding sequence of 546 disease-related transversion single-nucleotide variants (SNVs) with >90% precision (mean 96%) and up to 70% efficiency (mean 14%). Computational prediction of optimal CGBE-single-guide RNA pairs enables high-purity transversion base editing at over fourfold more target sites than achieved using any single CGBE variant.
可编程 C•G 到 G•C 碱基编辑器(CGBEs)具有广泛的科学和治疗潜力,但它们的编辑结果很难预测,而且编辑效率和产物纯度通常较低。我们描述了一套经过工程改造的 CGBEs 与机器学习模型相结合,以实现高效、高纯度的 C•G 到 G•C 碱基编辑。我们进行了针对 DNA 修复基因的 CRISPR 干扰(CRISPRi)筛选,以鉴定影响 C•G 到 G•C 编辑结果的因素,并利用这些见解开发了具有不同编辑谱的 CGBEs。我们在哺乳动物细胞中对 10638 个基因组整合的靶位点文库上对十种有前途的 CGBEs 进行了表征,并使用这些数据训练了机器学习模型,这些模型可以准确预测编辑结果的纯度和产量(R=0.90)。这些 CGBEs 能够纠正 546 种与疾病相关的颠换单核苷酸变异(SNVs)的野生型编码序列,准确率超过 90%(平均 96%),效率高达 70%(平均 14%)。最佳 CGBE-单指导 RNA 对的计算预测可实现四倍以上的目标位点的高纯度颠换碱基编辑,这一数量超过了任何单个 CGBE 变体所能实现的数量。