Smith Jennifer R, Tutaj Marek A, Thota Jyothi, Lamers Logan, Gibson Adam C, Kundurthi Akhilanand, Gollapally Varun Reddy, Brodie Kent C, Zacher Stacy, Laulederkind Stanley J F, Hayman G Thomas, Wang Shur-Jen, Tutaj Monika, Kaldunski Mary L, Vedi Mahima, Demos Wendy M, De Pons Jeffrey L, Dwinell Melinda R, Kwitek Anne E
Rat Genome Database, Department of Physiology, Medical College of Wisconsin, 8701 Watertown Plank Rd, Milwaukee, WI 53226, United States.
Clinical and Translational Science Institute, Medical College of Wisconsin, 8701 Watertown Plank Rd, Milwaukee, WI 53226, United States.
Database (Oxford). 2025 Jan 22;2025. doi: 10.1093/database/baae132.
The Rat Genome Database (RGD) is a multispecies knowledgebase which integrates genetic, multiomic, phenotypic, and disease data across 10 mammalian species. To support cross-species, multiomics studies and to enhance and expand on data manually extracted from the biomedical literature by the RGD team of expert curators, RGD imports and integrates data from multiple sources. These include major databases and a substantial number of domain-specific resources, as well as direct submissions by individual researchers. The incorporation of these diverse datatypes is handled by a growing list of automated import, export, data processing, and quality control pipelines. This article outlines the development over time of a standardized infrastructure for automated RGD pipelines with a summary of key design decisions and a focus on lessons learned.
大鼠基因组数据库(RGD)是一个多物种知识库,整合了10种哺乳动物的遗传、多组学、表型和疾病数据。为了支持跨物种多组学研究,并增强和扩展由RGD专家编目团队从生物医学文献中手动提取的数据,RGD导入并整合来自多个来源的数据。这些来源包括主要数据库、大量特定领域的资源以及个别研究人员的直接提交。这些不同数据类型的整合由越来越多的自动化导入、导出、数据处理和质量控制管道来处理。本文概述了RGD自动化管道标准化基础设施随时间的发展,总结了关键设计决策,并重点介绍了经验教训。