Suppr超能文献

findPC:一个用于在单细胞分析中自动选择主成分数量的 R 包。

findPC: An R package to automatically select the number of principal components in single-cell analysis.

机构信息

Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC 27710, USA.

出版信息

Bioinformatics. 2022 May 13;38(10):2949-2951. doi: 10.1093/bioinformatics/btac235.

Abstract

SUMMARY

Principal component analysis is widely used in analyzing single-cell genomic data. Selecting the optimal number of principal components (PCs) is a crucial step for downstream analyses. The elbow method is most commonly used for this task, but it requires one to visually inspect the elbow plot and manually choose the elbow point. To address this limitation, we developed six methods to automatically select the optimal number of PCs based on the elbow method. We evaluated the performance of these methods on real single-cell RNA-seq data from multiple human and mouse tissues and cell types. The perpendicular line method with 30 PCs has the best overall performance, and its results are highly consistent with the numbers of PCs identified manually. We implemented the six methods in an R package, findPC, that objectively selects the number of PCs and can be easily incorporated into any automatic analysis pipeline.

AVAILABILITY AND IMPLEMENTATION

findPC R package is freely available at https://github.com/haotian-zhuang/findPC.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

摘要

主成分分析在分析单细胞基因组数据中得到了广泛应用。选择最佳的主成分(PC)数量是下游分析的关键步骤。肘点法是最常用的方法,但它需要人工视觉检查肘点图并手动选择肘点。为了解决这个局限性,我们开发了六种基于肘点法自动选择最佳 PC 数量的方法。我们在来自多个人类和小鼠组织和细胞类型的真实单细胞 RNA-seq 数据上评估了这些方法的性能。30 个 PC 的垂线法具有最佳的整体性能,其结果与手动确定的 PC 数量高度一致。我们在 R 包 findPC 中实现了这六种方法,该方法客观地选择 PC 的数量,可以轻松地集成到任何自动分析管道中。

可用性和实现

findPC R 包可在 https://github.com/haotian-zhuang/findPC 上免费获取。

补充信息

补充数据可在 Bioinformatics 在线获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验