Kurniawan Celine, Itoh Takeshi
Master Program in Global Agriculture Technology and Genomic Science, International College, National Taiwan University, Taipei, Taiwan.
Center for Computational and Systems Biology, National Taiwan University, Taipei, Taiwan.
PLoS One. 2025 Jul 18;20(7):e0328905. doi: 10.1371/journal.pone.0328905. eCollection 2025.
Genome-editing technologies hold significant potential across various biotechnological fields, yet concerns about possible risks, including off-target mutations, remain. To ensure safe and effective application, these unintended mutations must be rigorously examined and minimized. Computational approaches are anticipated to streamline the detection of off-target mutations; however, the performance of current prediction tools is limited, likely owing to insufficient knowledge of off-target mutation characteristics. In this study, we collected experimentally validated off-target mutation data and conducted a large-scale analysis of 177 nonredundant datasets obtained from six studies. We developed a method to assess the statistical significance of sequence pattern similarity and diversity between off-target sites. This method is based on a comparison of ordered relative entropy values for aligned target sequences, and it was compared with two other methods on the basis of Euclidean distance and the Pearson correlation coefficient. The three methods demonstrated clear correlations, indicating their validity. These methods were applied to 238 dataset pairs for the same target site, and it was revealed that off-target sequence patterns were quite similar across different experimental conditions, such as varying cell lines and independent experiments, suggesting that the intrinsic properties of the Cas-sgRNA-DNA complex play a key role in determining cleavage sites. However, newly engineered enzymes and those from different bacterial sources occasionally display unique off-target patterns, indicating the need for comprehensive evaluation of each new enzyme to develop reliable prediction tools. The insights gained from this study are expected to contribute to a better understanding of off-target mutation characteristics and support the development of more accurate computational prediction methods.
基因组编辑技术在各个生物技术领域具有巨大潜力,但对包括脱靶突变在内的潜在风险的担忧依然存在。为确保安全有效的应用,必须严格检查并尽量减少这些意外突变。预计计算方法将简化脱靶突变的检测;然而,当前预测工具的性能有限,这可能是由于对脱靶突变特征的了解不足。在本研究中,我们收集了经实验验证的脱靶突变数据,并对从六项研究中获得的177个非冗余数据集进行了大规模分析。我们开发了一种方法来评估脱靶位点之间序列模式相似性和多样性的统计显著性。该方法基于比对后的目标序列的有序相对熵值的比较,并与基于欧几里得距离和皮尔逊相关系数的其他两种方法进行了比较。这三种方法显示出明显的相关性,表明它们的有效性。这些方法应用于针对同一目标位点的238个数据集对,结果表明,在不同的实验条件下,如不同的细胞系和独立实验中,脱靶序列模式非常相似,这表明Cas-sgRNA-DNA复合物的内在特性在决定切割位点方面起着关键作用。然而,新设计的酶和来自不同细菌来源的酶偶尔会表现出独特的脱靶模式,这表明需要对每种新酶进行全面评估,以开发可靠预测工具。预计本研究获得的见解将有助于更好地理解脱靶突变特征,并支持开发更准确的计算预测方法。