Suppr超能文献

从文献中提取生物活性化合物有用信息的数据挖掘方法。

Data Mining Approach for Extraction of Useful Information About Biologically Active Compounds from Publications.

机构信息

Department of Bioinformatics , Institute of Biomedical Chemistry , 10 Building 8, Pogodinskaya Street , Moscow 119121 , Russia.

Computer-Aided Drug Design Group, Chemical Biology Laboratory, Center for Cancer Research , National Cancer Institute , Frederick , Maryland 21702 , United States.

出版信息

J Chem Inf Model. 2019 Sep 23;59(9):3635-3644. doi: 10.1021/acs.jcim.9b00164. Epub 2019 Sep 10.

Abstract

A lot of high quality data on the biological activity of chemical compounds are required throughout the whole drug discovery process: from development of computational models of the structure-activity relationship to experimental testing of lead compounds and their validation in clinics. Currently, a large amount of such data is available from databases, scientific publications, and patents. Biological data are characterized by incompleteness, uncertainty, and low reproducibility. Despite the existence of free and commercially available databases of biological activities of compounds, they usually lack unambiguous information about peculiarities of biological assays. On the other hand, scientific papers are the primary source of new data disclosed to the scientific community for the first time. In this study, we have developed and validated a data-mining approach for extraction of text fragments containing description of bioassays. We have used this approach to evaluate compounds and their biological activity reported in scientific publications. We have found that categorization of papers into relevant and irrelevant may be performed based on the machine-learning analysis of the abstracts. Text fragments extracted from the full texts of publications allow their further partitioning into several classes according to the peculiarities of bioassays. We demonstrate the applicability of our approach to the comparison of the endpoint values of biological activity and cytotoxicity of reference compounds.

摘要

在整个药物发现过程中,都需要大量高质量的化合物生物活性数据:从开发结构-活性关系的计算模型到对先导化合物的实验测试及其在临床中的验证。目前,大量此类数据可从数据库、科学出版物和专利中获得。生物数据的特点是不完整性、不确定性和低重现性。尽管存在免费和商业可用的化合物生物活性数据库,但它们通常缺乏有关生物测定特点的明确信息。另一方面,科学论文是首次向科学界披露新数据的主要来源。在这项研究中,我们开发并验证了一种从包含生物测定描述的文本片段中提取信息的挖掘方法。我们使用此方法评估了科学出版物中报道的化合物及其生物活性。我们发现,可以根据对摘要的机器学习分析,将论文分为相关和不相关两类。从出版物全文中提取的文本片段可根据生物测定的特点进一步分为几个类别。我们证明了我们的方法可用于比较参考化合物的生物活性和细胞毒性的终点值。

相似文献

1
Data Mining Approach for Extraction of Useful Information About Biologically Active Compounds from Publications.
J Chem Inf Model. 2019 Sep 23;59(9):3635-3644. doi: 10.1021/acs.jcim.9b00164. Epub 2019 Sep 10.
2
A Machine Learning Approach for Predicting HIV Reverse Transcriptase Mutation Susceptibility of Biologically Active Compounds.
J Chem Inf Model. 2018 Aug 27;58(8):1544-1552. doi: 10.1021/acs.jcim.7b00475. Epub 2018 Jul 17.
3
Multiple Machine Learning Comparisons of HIV Cell-based and Reverse Transcriptase Data Sets.
Mol Pharm. 2019 Apr 1;16(4):1620-1632. doi: 10.1021/acs.molpharmaceut.8b01297. Epub 2019 Feb 26.
4
QSAR Modeling Using Large-Scale Databases: Case Study for HIV-1 Reverse Transcriptase Inhibitors.
J Chem Inf Model. 2015 Jul 27;55(7):1388-99. doi: 10.1021/acs.jcim.5b00019. Epub 2015 Jun 29.
6
Data mining and molecular dynamics analysis to detect HIV-1 reverse transcriptase RNase H activity inhibitor.
Mol Divers. 2024 Aug;28(4):1869-1888. doi: 10.1007/s11030-023-10707-6. Epub 2023 Aug 10.
8
Design, synthesis and biological evaluation of quinoxaline compounds as anti-HIV agents targeting reverse transcriptase enzyme.
Eur J Med Chem. 2020 Feb 15;188:111987. doi: 10.1016/j.ejmech.2019.111987. Epub 2019 Dec 23.
10
Interactome of the hepatitis C virus: Literature mining with ANDSystem.
Virus Res. 2016 Jun 15;218:40-8. doi: 10.1016/j.virusres.2015.12.003. Epub 2015 Dec 7.

引用本文的文献

1
The Artificial Intelligence-Powered New Era in Pharmaceutical Research and Development: A Review.
AAPS PharmSciTech. 2024 Aug 15;25(6):188. doi: 10.1208/s12249-024-02901-y.
2
Cheminformatics and artificial intelligence for accelerating agrochemical discovery.
Front Chem. 2023 Nov 29;11:1292027. doi: 10.3389/fchem.2023.1292027. eCollection 2023.
5
Web-Based Quantitative Structure-Activity Relationship Resources Facilitate Effective Drug Discovery.
Top Curr Chem (Cham). 2021 Sep 23;379(6):37. doi: 10.1007/s41061-021-00349-3.
6
Automated Extraction of Information From Texts of Scientific Publications: Insights Into HIV Treatment Strategies.
Front Genet. 2020 Dec 22;11:618862. doi: 10.3389/fgene.2020.618862. eCollection 2020.
8
(Q)SAR Models of HIV-1 Protein Inhibition by Drug-Like Compounds.
Molecules. 2019 Dec 25;25(1):87. doi: 10.3390/molecules25010087.

本文引用的文献

1
HIV Resistance Prediction to Reverse Transcriptase Inhibitors: Focus on Open Data.
Molecules. 2018 Apr 19;23(4):956. doi: 10.3390/molecules23040956.
2
Chemotext: A Publicly Available Web Server for Mining Drug-Target-Disease Relationships in PubMed.
J Chem Inf Model. 2018 Feb 26;58(2):212-218. doi: 10.1021/acs.jcim.7b00589. Epub 2018 Jan 19.
3
Database resources of the National Center for Biotechnology Information.
Nucleic Acids Res. 2018 Jan 4;46(D1):D8-D13. doi: 10.1093/nar/gkx1095.
4
DrugBank 5.0: a major update to the DrugBank database for 2018.
Nucleic Acids Res. 2018 Jan 4;46(D1):D1074-D1082. doi: 10.1093/nar/gkx1037.
5
Information Retrieval and Text Mining Technologies for Chemistry.
Chem Rev. 2017 Jun 28;117(12):7673-7761. doi: 10.1021/acs.chemrev.6b00851. Epub 2017 May 5.
6
Mapping of Drug-like Chemical Universe with Reduced Complexity Molecular Frameworks.
J Chem Inf Model. 2017 Apr 24;57(4):680-699. doi: 10.1021/acs.jcim.7b00006. Epub 2017 Apr 12.
7
The ChEMBL database in 2017.
Nucleic Acids Res. 2017 Jan 4;45(D1):D945-D954. doi: 10.1093/nar/gkw1074. Epub 2016 Nov 28.
8
Does 'Big Data' exist in medicinal chemistry, and if so, how can it be harnessed?
Future Med Chem. 2016 Oct;8(15):1801-1806. doi: 10.4155/fmc-2016-0163. Epub 2016 Sep 15.
9
Computer-aided discovery of anti-HIV agents.
Bioorg Med Chem. 2016 Oct 15;24(20):4768-4778. doi: 10.1016/j.bmc.2016.07.039. Epub 2016 Jul 21.
10
Modeling the Biodegradability of Chemical Compounds Using the Online CHEmical Modeling Environment (OCHEM).
Mol Inform. 2014 Jan;33(1):73-85. doi: 10.1002/minf.201300030. Epub 2013 Nov 28.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验