Department of Chemistry and Biochemistry, Rowan University, 201 Mullica Hill Road, Glassboro, New Jersey 08028, United States.
Department of Toxicology, Cuyahoga County Medical Examiner's Office, 11001 Cedar Avenue, Cleveland, Ohio 44106, United States.
Environ Sci Technol. 2023 Apr 25;57(16):6573-6588. doi: 10.1021/acs.est.3c00648. Epub 2023 Apr 11.
Traditional methodologies for assessing chemical toxicity are expensive and time-consuming. Computational modeling approaches have emerged as low-cost alternatives, especially those used to develop quantitative structure-activity relationship (QSAR) models. However, conventional QSAR models have limited training data, leading to low predictivity for new compounds. We developed a data-driven modeling approach for constructing carcinogenicity-related models and used these models to identify potential new human carcinogens. To this goal, we used a probe carcinogen dataset from the US Environmental Protection Agency's Integrated Risk Information System (IRIS) to identify relevant PubChem bioassays. Responses of 25 PubChem assays were significantly relevant to carcinogenicity. Eight assays inferred carcinogenicity predictivity and were selected for QSAR model training. Using 5 machine learning algorithms and 3 types of chemical fingerprints, 15 QSAR models were developed for each PubChem assay dataset. These models showed acceptable predictivity during 5-fold cross-validation (average CCR = 0.71). Using our QSAR models, we can correctly predict and rank 342 IRIS compounds' carcinogenic potentials (PPV = 0.72). The models predicted potential new carcinogens, which were validated by a literature search. This study portends an automated technique that can be applied to prioritize potential toxicants using validated QSAR models based on extensive training sets from public data resources.
传统的化学毒性评估方法既昂贵又耗时。计算建模方法已经成为低成本的替代方法,特别是那些用于开发定量构效关系(QSAR)模型的方法。然而,传统的 QSAR 模型训练数据有限,导致对新化合物的预测能力较低。我们开发了一种数据驱动的建模方法来构建致癌性相关模型,并使用这些模型来识别潜在的新人类致癌物。为此,我们使用来自美国环境保护署综合风险信息系统(IRIS)的探针致癌物数据集来识别相关的 PubChem 生物测定。25 个 PubChem 测定的反应与致癌性显著相关。有 8 个测定推断出致癌性预测性,并被选为 QSAR 模型训练。使用 5 种机器学习算法和 3 种化学指纹类型,为每个 PubChem 测定数据集开发了 15 个 QSAR 模型。这些模型在 5 倍交叉验证中表现出可接受的预测能力(平均 CCR = 0.71)。使用我们的 QSAR 模型,我们可以正确预测和排名 342 个 IRIS 化合物的致癌潜力(PPV = 0.72)。这些模型预测了潜在的新致癌物,通过文献检索进行了验证。这项研究预示着一种自动化技术,可以使用基于公共数据资源的广泛训练集的经过验证的 QSAR 模型来优先考虑潜在的有毒物质。