European Genome-phenome Archive (EGA) in the Centre for Genomic Regulation (CRG), the Barcelona Institute of Science and Technology Dr. Aiguader 88, Barcelona, 08003 Spain.
Brief Bioinform. 2022 May 13;23(3). doi: 10.1093/bib/bbac136.
Since its launch in 2008, the European Genome-Phenome Archive (EGA) has been leading the archiving and distribution of human identifiable genomic data. In this regard, one of the community concerns is the potential usability of the stored data, as of now, data submitters are not mandated to perform any quality control (QC) before uploading their data and associated metadata information. Here, we present a new File QC Portal developed at EGA, along with QC reports performed and created for 1 694 442 files [Fastq, sequence alignment map (SAM)/binary alignment map (BAM)/CRAM and variant call format (VCF)] submitted at EGA. QC reports allow anonymous EGA users to view summary-level information regarding the files within a specific dataset, such as quality of reads, alignment quality, number and type of variants and other features. Researchers benefit from being able to assess the quality of data prior to the data access decision and thereby, increasing the reusability of data (https://ega-archive.org/blog/data-upcycling-powered-by-ega/).
自 2008 年成立以来,欧洲基因组-表型档案库(EGA)一直引领着人类可识别基因组数据的归档和分发工作。在这方面,社区关注的问题之一是存储数据的潜在可用性,到目前为止,数据提交者在上传其数据及其相关元数据信息之前,无需执行任何质量控制(QC)。在这里,我们展示了在 EGA 开发的新的文件 QC 门户,以及为在 EGA 提交的 1 694 442 个文件[Fastq、序列比对映射(SAM)/二进制比对映射(BAM)/CRAM 和变异调用格式(VCF)]执行和创建的 QC 报告。QC 报告允许匿名 EGA 用户查看特定数据集内的文件的摘要级别的信息,例如读取质量、比对质量、变体数量和类型以及其他特征。研究人员可以在数据访问决策之前评估数据质量,从而提高数据的可重用性(https://ega-archive.org/blog/data-upcycling-powered-by-ega/)。