Suppr超能文献

GenoTools:一个用于高效基因型数据质量控制和分析的开源Python软件包。

GenoTools: an open-source Python package for efficient genotype data quality control and analysis.

作者信息

Vitale Dan, Koretsky Mathew J, Kuznetsov Nicole, Hong Samantha, Martin Jessica, James Mikayla, Makarious Mary B, Leonard Hampton, Iwaki Hirotaka, Faghri Faraz, Blauwendraat Cornelis, Singleton Andrew B, Song Yeajin, Levine Kristin, Kumar-Sreelatha Ashwin Ashok, Fang Zih-Hua, Nalls Mike

机构信息

Center for Alzheimer's and Related Dementias (CARD), National Institute on Aging, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD 20892, USA.

DataTecnica LLC., Washington, DC 20037, USA.

出版信息

G3 (Bethesda). 2025 Jan 8;15(1). doi: 10.1093/g3journal/jkae268.

Abstract

GenoTools, a Python package, streamlines population genetics research by integrating ancestry estimation, quality control, and genome-wide association studies capabilities into efficient pipelines. By tracking samples, variants, and quality-specific measures throughout fully customizable pipelines, users can easily manage genetics data for large and small studies. GenoTools' "Ancestry" module renders highly accurate predictions, allowing for high-quality ancestry-specific studies, and enables custom ancestry model training and serialization specified to the user's genotyping or sequencing platform. As the genotype processing engine that powers several large initiatives, including the NIH's Center for Alzheimer's and Related Dementias and the Global Parkinson's Genetics Program, GenoTools was used to process and analyze the UK Biobank and major Alzheimer's disease and Parkinson's disease datasets with over 400,000 genotypes from arrays and 5,000 whole genome sequencing samples and has led to novel discoveries in diverse populations. It has provided replicable ancestry predictions, implemented rigorous quality control, and conducted genetic ancestry-specific genome-wide association studies to identify systematic errors or biases through a single command. GenoTools is a customizable tool that enables users to efficiently analyze and scale genotyping and sequencing (whole genome sequencing and exome) data with reproducible and scalable ancestry, quality control, and genome-wide association studies pipelines.

摘要

GenoTools是一个Python软件包,它通过将血统估计、质量控制和全基因组关联研究功能整合到高效的流程中,简化了群体遗传学研究。通过在完全可定制的流程中跟踪样本、变异和质量特定指标,用户可以轻松管理大小研究的遗传学数据。GenoTools的“血统”模块提供高度准确的预测,允许进行高质量的特定血统研究,并支持根据用户的基因分型或测序平台进行定制的血统模型训练和序列化。作为为多个大型项目提供支持的基因型处理引擎,包括美国国立卫生研究院的阿尔茨海默病及相关痴呆症中心和全球帕金森病遗传学项目,GenoTools被用于处理和分析英国生物银行以及主要的阿尔茨海默病和帕金森病数据集,这些数据集包含来自阵列的超过400,000个基因型和5,000个全基因组测序样本,并在不同人群中带来了新的发现。它提供了可重复的血统预测,实施了严格的质量控制,并通过单一命令进行了特定遗传血统的全基因组关联研究,以识别系统误差或偏差。GenoTools是一个可定制的工具,它使用户能够通过可重复和可扩展的血统、质量控制和全基因组关联研究流程,高效地分析和扩展基因分型和测序(全基因组测序和外显子组测序)数据。

相似文献

2
GenoTools: An Open-Source Python Package for Efficient Genotype Data Quality Control and Analysis.
bioRxiv. 2024 Jul 3:2024.03.26.586362. doi: 10.1101/2024.03.26.586362.
3
NeuroBooster Array: A Genome-Wide Genotyping Platform to Study Neurological Disorders Across Diverse Populations.
Mov Disord. 2024 Nov;39(11):2039-2048. doi: 10.1002/mds.29902. Epub 2024 Sep 16.
5
6
Sparse principal component analysis for identifying ancestry-informative markers in genome-wide association studies.
Genet Epidemiol. 2012 May;36(4):293-302. doi: 10.1002/gepi.21621. Epub 2012 Apr 16.
7
Multi-ethnic Imputation System (MI-System): A genotype imputation server for high-dimensional data.
J Biomed Inform. 2023 Jul;143:104423. doi: 10.1016/j.jbi.2023.104423. Epub 2023 Jun 10.

引用本文的文献

4
The age at onset of LRRK2 p.Gly2019Ser Parkinson's disease across ancestries and countries of origin.
medRxiv. 2025 Jun 9:2025.06.04.25327685. doi: 10.1101/2025.06.04.25327685.
10
Does Play a Role in Parkinson's Disease Susceptibility Across Diverse Ancestral Populations?
medRxiv. 2025 Apr 11:2025.04.11.25325572. doi: 10.1101/2025.04.11.25325572.

本文引用的文献

1
NeuroBooster Array: A Genome-Wide Genotyping Platform to Study Neurological Disorders Across Diverse Populations.
Mov Disord. 2024 Nov;39(11):2039-2048. doi: 10.1002/mds.29902. Epub 2024 Sep 16.
2
Defining the causes of sporadic Parkinson's disease in the global Parkinson's genetics program (GP2).
NPJ Parkinsons Dis. 2023 Sep 12;9(1):131. doi: 10.1038/s41531-023-00533-w.
4
Polygenic Parkinson's Disease Genetic Risk Score as Risk Modifier of Parkinsonism in Gaucher Disease.
Mov Disord. 2023 May;38(5):899-903. doi: 10.1002/mds.29342. Epub 2023 Mar 3.
5
High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios.
Cell. 2022 Sep 1;185(18):3426-3440.e19. doi: 10.1016/j.cell.2022.08.004.
7
Insights into human genetic variation and population history from 929 diverse genomes.
Science. 2020 Mar 20;367(6484). doi: 10.1126/science.aay5012.
8
Patterns of African and Asian admixture in the Afrikaner population of South Africa.
BMC Biol. 2020 Feb 24;18(1):16. doi: 10.1186/s12915-020-0746-1.
9
The UK Biobank resource with deep phenotyping and genomic data.
Nature. 2018 Oct;562(7726):203-209. doi: 10.1038/s41586-018-0579-z. Epub 2018 Oct 10.
10
NeuroChip, an updated version of the NeuroX genotyping platform to rapidly screen for variants associated with neurological diseases.
Neurobiol Aging. 2017 Sep;57:247.e9-247.e13. doi: 10.1016/j.neurobiolaging.2017.05.009. Epub 2017 May 17.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验