Kan Mengyuan, Shumyatcher Maya, Diwadkar Avantika, Soliman Gabriel, Himes Blanca E
Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, USA.
AMIA Annu Symp Proc. 2018 Dec 5;2018:1338-1347. eCollection 2018.
Over 140,000 transcriptomic studies performed in healthy and diseased cell and tissue types, at baseline and after exposure to various agents, are available in public repositories. Integrating results of transcriptomic datasets has been an attractive approach to identify gene expression signatures that are more robust than those obtained for individual datasets, especially datasets with small sample size. We developed Reproducible Analysis and Validation of Expression Data (RAVED), a pipeline that facilitates the creation of R Markdown reports detailing reproducible analysis of publicly available transcriptomic data, and used it to analyze asthma and glucocorticoid response microarray and RNA-Seq datasets. Subsequently, we used three approaches to integrate summary statistics of these studies and identify cell/tissue-specific and global asthma and glucocorticoid-induced gene expression changes. Transcriptomic integration methods were incorporated into an online app called REALGAR, where end-users can specify datasets to integrate and quickly obtain results that may facilitate design of experimental studies.
在公共数据库中可获取超过14万项针对健康及患病细胞和组织类型、在基线状态以及暴露于各种因素后所进行的转录组学研究。整合转录组数据集的结果一直是一种颇具吸引力的方法,用于识别比单个数据集(尤其是样本量较小的数据集)所获得的基因表达特征更为稳健的特征。我们开发了表达数据的可重复分析与验证(RAVED)工具,这是一个有助于创建R Markdown报告的流程,该报告详细阐述了对公开可用转录组数据的可重复分析,并使用它来分析哮喘和糖皮质激素反应微阵列及RNA测序数据集。随后,我们采用三种方法来整合这些研究的汇总统计数据,并识别细胞/组织特异性以及全局性的哮喘和糖皮质激素诱导的基因表达变化。转录组整合方法被纳入一个名为REALGAR的在线应用程序中,终端用户可以在其中指定要整合的数据集,并快速获得可能有助于实验研究设计的结果。