Department of Statistics, University of California, Los Angeles, 90095, CA, USA.
Interdepartmental Program in Bioinformatics, University of California, Los Angeles, 90095, CA, USA.
Genome Biol. 2021 Oct 11;22(1):288. doi: 10.1186/s13059-021-02506-9.
High-throughput biological data analysis commonly involves identifying features such as genes, genomic regions, and proteins, whose values differ between two conditions, from numerous features measured simultaneously. The most widely used criterion to ensure the analysis reliability is the false discovery rate (FDR), which is primarily controlled based on p-values. However, obtaining valid p-values relies on either reasonable assumptions of data distribution or large numbers of replicates under both conditions. Clipper is a general statistical framework for FDR control without relying on p-values or specific data distributions. Clipper outperforms existing methods for a broad range of applications in high-throughput data analysis.
高通量生物数据分析通常涉及从同时测量的众多特征中识别出两个条件之间存在差异的特征,例如基因、基因组区域和蛋白质。最广泛使用的确保分析可靠性的标准是错误发现率 (FDR),该标准主要基于 p 值进行控制。然而,获得有效的 p 值依赖于数据分布的合理假设或两种条件下的大量重复。Clipper 是一种不依赖于 p 值或特定数据分布的通用 FDR 控制统计框架。在高通量数据分析的广泛应用中,Clipper 优于现有方法。