European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK.
Database (Oxford). 2018 Jan 1;2018:bay119. doi: 10.1093/database/bay119.
The major goal of sequencing humans and many other species is to understand the link between genomic variation, phenotype and disease. There are numerous valuable and well-established variation resources, but collating and making sense of non-homogeneous, often large-scale data sets from disparate sources remains a challenge. Without a systematic catalogue of these data and appropriate query and annotation tools, understanding the genome sequence of an individual and assessing their disease risk is impossible. In Ensembl, we substantially solve this problem: we develop methods to facilitate data integration and broad access; aggregate information in a consistent manner and make it available a variety of standard formats, both visually and programmatically; build analysis pipelines to compare variants to comprehensive genomic annotation sets; and make all tools and data publicly available.
人类和许多其他物种测序的主要目标是了解基因组变异、表型和疾病之间的联系。有许多有价值且成熟的变异资源,但整合和理解来自不同来源的非同质、通常是大规模数据集仍然是一个挑战。如果没有这些数据的系统目录以及适当的查询和注释工具,就不可能理解个体的基因组序列并评估其疾病风险。在 Ensembl 中,我们从根本上解决了这个问题:我们开发了促进数据集成和广泛访问的方法;以一致的方式汇总信息,并以各种标准格式(包括可视化和编程方式)提供信息;构建分析管道,将变体与全面的基因组注释集进行比较;并公开所有工具和数据。