Suppr超能文献

实现基因表达数据整合的第一步:采用 COMMAND>_获取转录组学数据。

First step toward gene expression data integration: transcriptomic data acquisition with COMMAND>_.

机构信息

Unit of Computational Biology, Research and Innovation Centre, Fondazione Edmund Mach, via E. Mach 1, 38010, San Michele all'Adige, Italy.

Laboratorio Internacional de Investigación Sobre el Genoma Humano, Universidad Nacional Autónoma De México, 76230, Juriquilla, Querétaro, Mexico.

出版信息

BMC Bioinformatics. 2019 Jan 28;20(1):54. doi: 10.1186/s12859-019-2643-6.

Abstract

BACKGROUND

Exploring cellular responses to stimuli using extensive gene expression profiles has become a routine procedure performed on a daily basis. Raw and processed data from these studies are available on public databases but the opportunity to fully exploit such rich datasets is limited due to the large heterogeneity of data formats. In recent years, several approaches have been proposed to effectively integrate gene expression data for analysis and exploration at a broader level. Despite the different goals and approaches towards gene expression data integration, the first step is common to any proposed method: data acquisition. Although it is seemingly straightforward to extract valuable information from a set of downloaded files, things can rapidly get complicated, especially as the number of experiments grows. Transcriptomic datasets are deposited in public databases with little regard to data format and thus retrieving raw data might become a challenging task. While for RNA-seq experiments such problem is partially mitigated by the fact that raw reads are generally available on databases such as the NCBI SRA, for microarray experiments standards are not equally well established, or enforced during submission, and thus a multitude of data formats has emerged.

RESULTS

COMMAND>_ is a specialized tool meant to simplify gene expression data acquisition. It is a flexible multi-user web-application that allows users to search and download gene expression experiments, extract only the relevant information from experiment files, re-annotate microarray platforms, and present data in a simple and coherent data model for subsequent analysis.

CONCLUSIONS

COMMAND>_ facilitates the creation of local datasets of gene expression data coming from both microarray and RNA-seq experiments and may be a more efficient tool to build integrated gene expression compendia. COMMAND>_ is free and open-source software, including publicly available tutorials and documentation.

摘要

背景

使用广泛的基因表达谱探索细胞对刺激的反应已成为日常例行程序。这些研究的原始和处理后的数据可在公共数据库中获得,但由于数据格式的巨大异质性,充分利用这些丰富数据集的机会受到限制。近年来,已经提出了几种方法来有效地整合基因表达数据,以便在更广泛的层面上进行分析和探索。尽管基因表达数据集成的目标和方法不同,但任何提出的方法的第一步都是相同的:数据采集。尽管从一组下载的文件中提取有价值的信息似乎很简单,但事情很快就会变得复杂,尤其是随着实验数量的增加。转录组数据集以很少考虑数据格式的方式存储在公共数据库中,因此检索原始数据可能会成为一项具有挑战性的任务。虽然对于 RNA-seq 实验,由于原始读取通常可在 NCBI SRA 等数据库中获得,因此可以部分缓解此问题,但对于微阵列实验,标准的建立并不完善,或者在提交时也没有强制执行,因此出现了多种数据格式。

结果

COMMAND>_是一种专门用于简化基因表达数据采集的工具。它是一个灵活的多用户 Web 应用程序,允许用户搜索和下载基因表达实验,从实验文件中仅提取相关信息,重新注释微阵列平台,并以简单一致的数据模型呈现数据,以便进行后续分析。

结论

COMMAND>_ 简化了来自微阵列和 RNA-seq 实验的基因表达数据的本地数据集的创建,并且可能是构建集成基因表达纲要的更有效工具。COMMAND>_ 是免费的开源软件,包括公开可用的教程和文档。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ad04/6348648/e883f862b4ed/12859_2019_2643_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验