Dipartimento di Bioscienze, Università di Milano, via Celoria 26, 20133 Milan, Italy.
Istituto di Biomembrane, Bioenergetica e Biotecnologie Molecolari, Consiglio Nazionale delle Ricerche, Via Amendola 165/A, 70126 Bari, Italy.
Nucleic Acids Res. 2018 May 4;46(8):e46. doi: 10.1093/nar/gky055.
RNA sequencing (RNA-Seq) has become the experimental standard in transcriptome studies. While most of the bioinformatic pipelines for the analysis of RNA-Seq data and the identification of significant changes in transcript abundance are based on the comparison of two conditions, it is common practice to perform several experiments in parallel (e.g. from different individuals, developmental stages, tissues), for the identification of genes showing a significant variation of expression across all the conditions studied. In this work we present RNentropy, a methodology based on information theory devised for this task, which given expression estimates from any number of RNA-Seq samples and conditions identifies genes or transcripts with a significant variation of expression across all the conditions studied, together with the samples in which they are over- or under-expressed. To show the capabilities offered by our methodology, we applied it to different RNA-Seq datasets: 48 biological replicates of two different yeast conditions; samples extracted from six human tissues of three individuals; seven different mouse brain cell types; human liver samples from six individuals. Results, and their comparison to different state of the art bioinformatic methods, show that RNentropy can provide a quick and in depth analysis of significant changes in gene expression profiles over any number of conditions.
RNA 测序(RNA-Seq)已成为转录组研究的实验标准。虽然大多数用于分析 RNA-Seq 数据和识别转录丰度显著变化的生物信息学管道都是基于两种条件的比较,但通常会并行进行多个实验(例如,来自不同个体、发育阶段、组织),以识别在所有研究条件下表达显著变化的基因。在这项工作中,我们提出了 RNentropy,这是一种基于信息论的方法,专为这项任务设计,它可以根据任意数量的 RNA-Seq 样本和条件的表达估计值,识别在所有研究条件下表达显著变化的基因或转录本,以及它们在哪些样本中过表达或低表达。为了展示我们的方法所提供的功能,我们将其应用于不同的 RNA-Seq 数据集:两种不同酵母条件的 48 个生物学重复样本;来自三个个体的六个人类组织的样本;七种不同的小鼠脑细胞类型;来自六个个体的人类肝脏样本。结果及其与不同最先进的生物信息学方法的比较表明,RNentropy 可以快速深入地分析任意数量条件下基因表达谱的显著变化。