Vitale Dan, Koretsky Mathew J, Kuznetsov Nicole, Hong Samantha, Martin Jessica, James Mikayla, Makarious Mary B, Leonard Hampton, Iwaki Hirotaka, Faghri Faraz, Blauwendraat Cornelis, Singleton Andrew B, Song Yeajin, Levine Kristin, Kumar-Sreelatha Ashwin Ashok, Fang Zih-Hua, Nalls Mike
Center for Alzheimer's and Related Dementias (CARD), National Institute on Aging, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD 20892, USA.
DataTecnica LLC., Washington, DC 20037, USA.
G3 (Bethesda). 2025 Jan 8;15(1). doi: 10.1093/g3journal/jkae268.
GenoTools, a Python package, streamlines population genetics research by integrating ancestry estimation, quality control, and genome-wide association studies capabilities into efficient pipelines. By tracking samples, variants, and quality-specific measures throughout fully customizable pipelines, users can easily manage genetics data for large and small studies. GenoTools' "Ancestry" module renders highly accurate predictions, allowing for high-quality ancestry-specific studies, and enables custom ancestry model training and serialization specified to the user's genotyping or sequencing platform. As the genotype processing engine that powers several large initiatives, including the NIH's Center for Alzheimer's and Related Dementias and the Global Parkinson's Genetics Program, GenoTools was used to process and analyze the UK Biobank and major Alzheimer's disease and Parkinson's disease datasets with over 400,000 genotypes from arrays and 5,000 whole genome sequencing samples and has led to novel discoveries in diverse populations. It has provided replicable ancestry predictions, implemented rigorous quality control, and conducted genetic ancestry-specific genome-wide association studies to identify systematic errors or biases through a single command. GenoTools is a customizable tool that enables users to efficiently analyze and scale genotyping and sequencing (whole genome sequencing and exome) data with reproducible and scalable ancestry, quality control, and genome-wide association studies pipelines.
GenoTools是一个Python软件包,它通过将血统估计、质量控制和全基因组关联研究功能整合到高效的流程中,简化了群体遗传学研究。通过在完全可定制的流程中跟踪样本、变异和质量特定指标,用户可以轻松管理大小研究的遗传学数据。GenoTools的“血统”模块提供高度准确的预测,允许进行高质量的特定血统研究,并支持根据用户的基因分型或测序平台进行定制的血统模型训练和序列化。作为为多个大型项目提供支持的基因型处理引擎,包括美国国立卫生研究院的阿尔茨海默病及相关痴呆症中心和全球帕金森病遗传学项目,GenoTools被用于处理和分析英国生物银行以及主要的阿尔茨海默病和帕金森病数据集,这些数据集包含来自阵列的超过400,000个基因型和5,000个全基因组测序样本,并在不同人群中带来了新的发现。它提供了可重复的血统预测,实施了严格的质量控制,并通过单一命令进行了特定遗传血统的全基因组关联研究,以识别系统误差或偏差。GenoTools是一个可定制的工具,它使用户能够通过可重复和可扩展的血统、质量控制和全基因组关联研究流程,高效地分析和扩展基因分型和测序(全基因组测序和外显子组测序)数据。