Department of Academic and Institutional Resources and Technology, University of North Texas Health Science Center, Fort Worth, USA.
BMC Bioinformatics. 2012;13 Suppl 15(Suppl 15):S7. doi: 10.1186/1471-2105-13-S15-S7. Epub 2012 Sep 11.
Next-Generation Sequencing (NGS) technologies and Genome-Wide Association Studies (GWAS) generate millions of reads and hundreds of datasets, and there is an urgent need for a better way to accurately interpret and distill such large amounts of data. Extensive pathway and network analysis allow for the discovery of highly significant pathways from a set of disease vs. healthy samples in the NGS and GWAS. Knowledge of activation of these processes will lead to elucidation of the complex biological pathways affected by drug treatment, to patient stratification studies of new and existing drug treatments, and to understanding the underlying anti-cancer drug effects. There are approximately 141 biological human pathway resources as of Jan 2012 according to the Pathguide database. However, most currently available resources do not contain disease, drug or organ specificity information such as disease-pathway, drug-pathway, and organ-pathway associations. Systematically integrating pathway, disease, drug and organ specificity together becomes increasingly crucial for understanding the interrelationships between signaling, metabolic and regulatory pathway, drug action, disease susceptibility, and organ specificity from high-throughput omics data (genomics, transcriptomics, proteomics and metabolomics).
We designed the Integrated Pathway Analysis Database for Systematic Enrichment Analysis (IPAD, http://bioinfo.hsc.unt.edu/ipad), defining inter-association between pathway, disease, drug and organ specificity, based on six criteria: 1) comprehensive pathway coverage; 2) gene/protein to pathway/disease/drug/organ association; 3) inter-association between pathway, disease, drug, and organ; 4) multiple and quantitative measurement of enrichment and inter-association; 5) assessment of enrichment and inter-association analysis with the context of the existing biological knowledge and a "gold standard" constructed from reputable and reliable sources; and 6) cross-linking of multiple available data sources.IPAD is a comprehensive database covering about 22,498 genes, 25,469 proteins, 1956 pathways, 6704 diseases, 5615 drugs, and 52 organs integrated from databases including the BioCarta, KEGG, NCI-Nature curated, Reactome, CTD, PharmGKB, DrugBank, PharmGKB, and HOMER. The database has a web-based user interface that allows users to perform enrichment analysis from genes/proteins/molecules and inter-association analysis from a pathway, disease, drug, and organ.Moreover, the quality of the database was validated with the context of the existing biological knowledge and a "gold standard" constructed from reputable and reliable sources. Two case studies were also presented to demonstrate: 1) self-validation of enrichment analysis and inter-association analysis on brain-specific markers, and 2) identification of previously undiscovered components by the enrichment analysis from a prostate cancer study.
IPAD is a new resource for analyzing, identifying, and validating pathway, disease, drug, organ specificity and their inter-associations. The statistical method we developed for enrichment and similarity measurement and the two criteria we described for setting the threshold parameters can be extended to other enrichment applications. Enriched pathways, diseases, drugs, organs and their inter-associations can be searched, displayed, and downloaded from our online user interface. The current IPAD database can help users address a wide range of biological pathway related, disease susceptibility related, drug target related and organ specificity related questions in human disease studies.
下一代测序 (NGS) 技术和全基因组关联研究 (GWAS) 产生了数百万个读取和数百个数据集,因此迫切需要一种更好的方法来准确解释和提取如此大量的数据。广泛的途径和网络分析允许从一组疾病与健康样本中发现 NGS 和 GWAS 中高度显著的途径。了解这些过程的激活将导致阐明受药物治疗影响的复杂生物学途径,对新的和现有的药物治疗进行患者分层研究,并了解潜在的抗癌药物作用。截至 2012 年 1 月,根据 Pathguide 数据库,大约有 141 个人类生物途径资源。然而,目前大多数可用资源都不包含疾病、药物或器官特异性信息,例如疾病途径、药物途径和器官途径关联。系统地整合途径、疾病、药物和器官特异性对于理解信号转导、代谢和调节途径、药物作用、疾病易感性和器官特异性之间的相互关系变得越来越重要,这些都来自于高通量组学数据(基因组学、转录组学、蛋白质组学和代谢组学)。
我们设计了用于系统富集分析的综合途径分析数据库(IPAD,http://bioinfo.hsc.unt.edu/ipad),基于六个标准定义了途径、疾病、药物和器官特异性之间的相互关联:1)全面的途径覆盖范围;2)基因/蛋白质与途径/疾病/药物/器官的关联;3)途径、疾病、药物和器官之间的相互关联;4)多种和定量测量富集和相互关联;5)使用现有的生物学知识和从信誉良好且可靠的来源构建的“黄金标准”评估富集和相互关联分析;6)链接多个可用数据源。IPAD 是一个综合数据库,涵盖了来自数据库的约 22498 个基因、25469 个蛋白质、1956 个途径、6704 种疾病、5615 种药物和 52 个器官,包括 BioCarta、KEGG、NCI-Nature curated、Reactome、CTD、PharmGKB、DrugBank、PharmGKB 和 HOMER。该数据库具有基于网络的用户界面,允许用户从基因/蛋白质/分子进行富集分析,并从途径、疾病、药物和器官进行相互关联分析。此外,该数据库的质量已通过现有生物学知识的上下文和从信誉良好且可靠的来源构建的“黄金标准”进行了验证。还呈现了两个案例研究,以证明:1)脑特异性标志物的富集分析和相互关联分析的自我验证,以及 2)从前列腺癌研究中通过富集分析发现以前未发现的成分。
IPAD 是一种用于分析、识别和验证途径、疾病、药物、器官特异性及其相互关联的新资源。我们为富集和相似性测量开发的统计方法以及我们描述的用于设置阈值参数的两个标准可以扩展到其他富集应用程序。可以从我们的在线用户界面搜索、显示和下载富集的途径、疾病、药物、器官及其相互关联。当前的 IPAD 数据库可以帮助用户解决人类疾病研究中广泛的与生物学途径相关、疾病易感性相关、药物靶点相关和器官特异性相关的问题。