Department of Coatings and Polymeric Materials, North Dakota State University, Fargo, North Dakota 58102, United States.
Biomedical Engineering Program, North Dakota State University, Fargo, North Dakota 58102, United States.
Environ Sci Technol. 2024 Jun 11;58(23):10116-10127. doi: 10.1021/acs.est.4c01017. Epub 2024 May 26.
In recent years, alternative animal testing methods such as computational and machine learning approaches have become increasingly crucial for toxicity testing. However, the complexity and scarcity of available biomedical data challenge the development of predictive models. Combining nonlinear machine learning together with multicondition descriptors offers a solution for using data from various assays to create a robust model. This work applies multicondition descriptors (MCDs) to develop a QSTR (Quantitative Structure-Toxicity Relationship) model based on a large toxicity data set comprising more than 80,000 compounds and 59 different end points (122,572 data points). The prediction capabilities of developed single-task multi-end point machine learning models as well as a novel data analysis approach with the use of Convolutional Neural Networks (CNN) are discussed. The results show that using MCDs significantly improves the model and using them with CNN-1D yields the best result ( = 0.93, = 0.70). Several structural features showed a high level of contribution to the toxicity, including van der Waals surface area (VSA), number of nitrogen-containing fragments (nN+), presence of S-P fragments, ionization potential, and presence of C-N fragments. The developed models can be very useful tools to predict the toxicity of various compounds under different conditions, enabling quick toxicity assessment of new compounds.
近年来,替代动物测试方法,如计算和机器学习方法,对于毒性测试变得越来越重要。然而,可用生物医学数据的复杂性和稀缺性挑战了预测模型的发展。将非线性机器学习与多条件描述符相结合,为使用来自各种测定的的数据创建稳健的模型提供了一种解决方案。这项工作应用多条件描述符 (MCD) 基于一个包含超过 80000 种化合物和 59 种不同终点 (122572 个数据点) 的大型毒性数据集,开发了一种 QSTR(定量结构-毒性关系)模型。讨论了开发的单任务多终点机器学习模型的预测能力以及使用卷积神经网络 (CNN) 的新数据分析方法。结果表明,使用 MCD 可显著改善模型,而与 CNN-1D 一起使用 MCD 则可获得最佳结果 (=0.93, =0.70)。一些结构特征对毒性表现出高度的贡献,包括范德华表面积 (VSA)、含氮片段数 (nN+)、S-P 片段的存在、电离势和 C-N 片段的存在。开发的模型可以成为预测不同条件下各种化合物毒性的非常有用的工具,能够快速评估新化合物的毒性。