Mustali Jessica, Yasuda Ikki, Hirano Yoshinori, Yasuoka Kenji, Gautieri Alfonso, Arai Noriyoshi
Department of Electronics, Information and Bioengineering, Politecnico di Milano Italy.
Department of Mechanical Engineering, Keio University Japan
RSC Adv. 2023 Nov 22;13(48):34249-34261. doi: 10.1039/d3ra06375e. eCollection 2023 Nov 16.
Molecular dynamics (MD) simulations, which are central to drug discovery, offer detailed insights into protein-ligand interactions. However, analyzing large MD datasets remains a challenge. Current machine-learning solutions are predominantly supervised and have data labelling and standardisation issues. In this study, we adopted an unsupervised deep-learning framework, previously benchmarked for rigid proteins, to study the more flexible SARS-CoV-2 main protease (M). We ran MD simulations of M with various ligands and refined the data by focusing on binding-site residues and time frames in stable protein conformations. The optimal descriptor chosen was the distance between the residues and the center of the binding pocket. Using this approach, a local dynamic ensemble was generated and fed into our neural network to compute Wasserstein distances across system pairs, revealing ligand-induced conformational differences in M. Dimensionality reduction yielded an embedding map that correlated ligand-induced dynamics and binding affinity. Notably, the high-affinity compounds showed pronounced effects on the protein's conformations. We also identified the key residues that contributed to these differences. Our findings emphasize the potential of combining unsupervised deep learning with MD simulations to extract valuable information and accelerate drug discovery.
分子动力学(MD)模拟是药物发现的核心,它能深入洞察蛋白质-配体相互作用。然而,分析大型MD数据集仍然是一项挑战。当前的机器学习解决方案主要是有监督的,存在数据标记和标准化问题。在本研究中,我们采用了一个先前针对刚性蛋白质进行过基准测试的无监督深度学习框架,来研究更具柔性的严重急性呼吸综合征冠状病毒2(SARS-CoV-2)主要蛋白酶(M)。我们对M与各种配体进行了MD模拟,并通过关注稳定蛋白质构象中的结合位点残基和时间框架来完善数据。选择的最佳描述符是残基与结合口袋中心之间的距离。使用这种方法,生成了一个局部动态集合,并将其输入到我们的神经网络中,以计算系统对之间的瓦瑟斯坦距离,揭示配体诱导的M构象差异。降维产生了一个嵌入图,该图将配体诱导的动力学与结合亲和力相关联。值得注意的是,高亲和力化合物对蛋白质构象有显著影响。我们还确定了导致这些差异的关键残基。我们的研究结果强调了将无监督深度学习与MD模拟相结合以提取有价值信息并加速药物发现的潜力。