Yang Jason, Li Francesca-Zhoufan, Arnold Frances H
Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, United States.
Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California 91125, United States.
ACS Cent Sci. 2024 Feb 5;10(2):226-241. doi: 10.1021/acscentsci.3c01275. eCollection 2024 Feb 28.
Enzymes can be engineered at the level of their amino acid sequences to optimize key properties such as expression, stability, substrate range, and catalytic efficiency-or even to unlock new catalytic activities not found in nature. Because the search space of possible proteins is vast, enzyme engineering usually involves discovering an enzyme starting point that has some level of the desired activity followed by directed evolution to improve its "fitness" for a desired application. Recently, machine learning (ML) has emerged as a powerful tool to complement this empirical process. ML models can contribute to (1) starting point discovery by functional annotation of known protein sequences or generating novel protein sequences with desired functions and (2) navigating protein fitness landscapes for fitness optimization by learning mappings between protein sequences and their associated fitness values. In this Outlook, we explain how ML complements enzyme engineering and discuss its future potential to unlock improved engineering outcomes.
酶可以在其氨基酸序列水平上进行改造,以优化关键特性,如表达、稳定性、底物范围和催化效率,甚至还能开启自然界中未发现的新催化活性。由于可能的蛋白质搜索空间非常庞大,酶工程通常包括找到一个具有一定程度所需活性的酶起始点,然后通过定向进化来提高其在特定应用中的“适应性”。最近,机器学习(ML)已成为补充这一经验过程的强大工具。ML模型可有助于:(1)通过对已知蛋白质序列进行功能注释或生成具有所需功能的新蛋白质序列来发现起始点;(2)通过学习蛋白质序列与其相关适应性值之间的映射关系,在蛋白质适应性景观中导航以优化适应性。在本展望文章中,我们解释了ML如何补充酶工程,并讨论了其未来解锁更好工程成果的潜力。