Department of Biological Sciences, Louisiana State University, Baton Rouge, LA, USA.
Division of Electrical and Computer Engineering, Louisiana State University, Baton Rouge, LA, USA.
Methods Mol Biol. 2025;2867:79-104. doi: 10.1007/978-1-0716-4196-5_5.
The elucidation of protein structure and function plays a pivotal role in understanding biological processes and facilitating drug discovery. With the exponential growth of protein sequence data, machine learning techniques have emerged as powerful tools for predicting protein characteristics from sequences alone. This review provides a comprehensive overview of the importance and application of machine learning in inferring protein structure and function. We discuss various machine learning approaches, primarily focusing on convolutional neural networks and natural language processing, and their utilization in predicting protein secondary and tertiary structures, residue-residue contacts, protein function, and subcellular localization. Furthermore, we highlight the challenges associated with using machine learning techniques in this context, such as the availability of high-quality training datasets and the interpretability of models. We also delve into the latest progress in the field concerning the advancements made in the development of intricate deep learning architectures. Overall, this review underscores the significance of machine learning in advancing our understanding of protein structure and function, and its potential to revolutionize drug discovery and personalized medicine.
阐明蛋白质结构和功能对于理解生物过程和促进药物发现起着关键作用。随着蛋白质序列数据的指数级增长,机器学习技术已经成为仅从序列预测蛋白质特性的强大工具。
本综述全面概述了机器学习在推断蛋白质结构和功能方面的重要性和应用。我们讨论了各种机器学习方法,主要集中在卷积神经网络和自然语言处理上,以及它们在预测蛋白质二级和三级结构、残基残基接触、蛋白质功能和亚细胞定位方面的应用。
此外,我们还强调了在这种情况下使用机器学习技术所面临的挑战,例如高质量训练数据集的可用性和模型的可解释性。我们还深入探讨了该领域的最新进展,包括在开发复杂的深度学习架构方面所取得的进展。
总的来说,本综述强调了机器学习在推进我们对蛋白质结构和功能的理解方面的重要性,以及它在药物发现和个性化医疗方面的潜力。