Computing and Computational Sciences Directorate, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA.
UT/ORNL Center for Molecular Biophysics, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA.
Sci Data. 2023 Mar 28;10(1):173. doi: 10.1038/s41597-023-01984-9.
This dataset contains ligand conformations and docking scores for 1.4 billion molecules docked against 6 structural targets from SARS-CoV2, representing 5 unique proteins: MPro, NSP15, PLPro, RDRP, and the Spike protein. Docking was carried out using the AutoDock-GPU platform on the Summit supercomputer and Google Cloud. The docking procedure employed the Solis Wets search method to generate 20 independent ligand binding poses per compound. Each compound geometry was scored using the AutoDock free energy estimate, and rescored using RFScore v3 and DUD-E machine-learned rescoring models. Input protein structures are included, suitable for use by AutoDock-GPU and other docking programs. As the result of an exceptionally large docking campaign, this dataset represents a valuable resource for discovering trends across small molecule and protein binding sites, training AI models, and comparing to inhibitor compounds targeting SARS-CoV-2. The work also gives an example of how to organize and process data from ultra-large docking screens.
这个数据集包含了针对 SARS-CoV2 的 6 个结构靶点,对 14 亿个分子进行对接的配体构象和对接分数,代表 5 个独特的蛋白质:MPro、NSP15、PLPro、RDRP 和 Spike 蛋白。对接是在 Summit 超级计算机和 Google Cloud 上使用 AutoDock-GPU 平台进行的。对接过程采用了 Solis Wets 搜索方法,每个化合物生成 20 个独立的配体结合构象。每个化合物的几何形状都使用 AutoDock 自由能估计进行评分,并使用 RFScore v3 和 DUD-E 机器学习重新评分模型进行重新评分。包含输入的蛋白质结构,适合 AutoDock-GPU 和其他对接程序使用。由于进行了一次异常大规模的对接活动,这个数据集为发现小分子和蛋白质结合位点的趋势、训练人工智能模型以及与针对 SARS-CoV-2 的抑制剂化合物进行比较提供了有价值的资源。这项工作还展示了如何组织和处理来自超大规模对接筛选的数据。