Suppr超能文献

从高分辨率质谱数据预测互补生物测定中未知化学物质的活性,以确定潜在的内分泌干扰物。

Predicting the Activity of Unidentified Chemicals in Complementary Bioassays from the HRMS Data to Pinpoint Potential Endocrine Disruptors.

作者信息

Rahu Ida, Kull Meelis, Kruve Anneli

机构信息

Institute of Computer Science, University of Tartu, Narva mnt 18, Tartu 51009, Estonia.

Department of Materials and Environmental Chemistry, Stockholm University, Svante Arrhenius Väg 16, Stockholm SE-106 91, Sweden.

出版信息

J Chem Inf Model. 2024 Apr 22;64(8):3093-3104. doi: 10.1021/acs.jcim.3c02050. Epub 2024 Mar 24.

Abstract

The majority of chemicals detected via nontarget liquid chromatography high-resolution mass spectrometry (HRMS) in environmental samples remain unidentified, challenging the capability of existing machine learning models to pinpoint potential endocrine disruptors (EDs). Here, we predict the activity of unidentified chemicals across 12 bioassays related to EDs within the Tox21 10K dataset. Single- and multi-output models, utilizing various machine learning algorithms and molecular fingerprint features as an input, were trained for this purpose. To evaluate the models under near real-world conditions, Monte Carlo sampling was implemented for the first time. This technique enables the use of probabilistic fingerprint features derived from the experimental HRMS data with SIRIUS+CSI:FingerID as an input for models trained on true binary fingerprint features. Depending on the bioassay, the lowest false-positive rate at 90% recall ranged from 0.251 (sr.mmp, mitochondrial membrane potential) to 0.824 (nr.ar, androgen receptor), which is consistent with the trends observed in the models' performances submitted for the Tox21 Data Challenge. These findings underscore the informativeness of fingerprint features that can be compiled from HRMS in predicting the endocrine-disrupting activity. Moreover, an in-depth SHapley Additive exPlanations analysis unveiled the models' ability to pinpoint structural patterns linked to the modes of action of active chemicals. Despite the superior performance of the single-output models compared to that of the multi-output models, the latter's potential cannot be disregarded for similar tasks in the field of toxicology. This study presents a significant advancement in identifying potentially toxic chemicals within complex mixtures without unambiguous identification and effectively reducing the workload for postprocessing by up to 75% in nontarget HRMS.

摘要

通过非靶向液相色谱高分辨率质谱(HRMS)在环境样品中检测到的大多数化学物质仍未得到识别,这对现有机器学习模型识别潜在内分泌干扰物(ED)的能力提出了挑战。在此,我们在Tox21 10K数据集中预测了与ED相关的12种生物测定中未识别化学物质的活性。为此,使用各种机器学习算法和分子指纹特征作为输入,训练了单输出和多输出模型。为了在接近真实世界的条件下评估模型,首次实施了蒙特卡罗采样。该技术能够将从实验HRMS数据中获得的概率指纹特征与SIRIUS+CSI:FingerID一起用作基于真实二元指纹特征训练的模型的输入。根据生物测定的不同,召回率为90%时的最低假阳性率范围为0.251(sr.mmp,线粒体膜电位)至0.824(nr.ar,雄激素受体),这与Tox21数据挑战赛提交的模型性能中观察到的趋势一致。这些发现强调了可从HRMS编译的指纹特征在预测内分泌干扰活性方面的信息价值。此外,深入的SHapley加性解释分析揭示了模型识别与活性化学物质作用模式相关的结构模式的能力。尽管单输出模型的性能优于多输出模型,但在毒理学领域的类似任务中,多输出模型的潜力也不容忽视。本研究在无需明确识别的情况下识别复杂混合物中潜在有毒化学物质方面取得了重大进展,并有效减少了非靶向HRMS中高达75%的后处理工作量。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cce7/11040721/34a78f21f57c/ci3c02050_0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验