Frost H Robert
Dartmouth College, Hanover NH 03755, USA.
Comput Intell Methods Bioinform Biostat. 2025;15276:183-195. doi: 10.1007/978-3-031-89704-7_14. Epub 2025 May 15.
Although single cell RNA-sequencing (scRNA-seq) provides unprecedented insights into the biology of complex tissues, analyzing such data on a gene-by-gene basis is challenging due to the large number of tested hypotheses and consequent low statistical power and difficult interpretation. These issues are magnified by the increased noise, significant sparsity and multi-modal distributions characteristic of single cell data. One promising approach for addressing these challenges is gene set testing, or pathway analysis. Unfortunately, statistical and biological differences between single cell and bulk transcriptomic data make it challenging to use existing gene set collections, which were developed for bulk tissue analysis, on scRNA-seq data. In this paper, we describe a procedure for customizing gene set collections originally created for bulk tissue analysis to reflect the structure of gene activity within specific cell types. Our approach leverages information about mean gene expression in the 81 human cell types profiled via scRNA-seq by the Human Protein Atlas (HPA) Single Cell Type Atlas. This HPA information is used to compute cell type-specific gene and gene set weights that can be used to filter or weight gene set collections. As demonstrated through the analysis of immune cell scRNA-seq data using gene sets from the Molecular Signatures Database (MSigDB), accounting for cell type-specificity can significantly improve gene set testing power and interpretability.
尽管单细胞RNA测序(scRNA-seq)为复杂组织的生物学研究提供了前所未有的见解,但由于测试假设数量众多,统计功效低且难以解释,逐基因分析此类数据具有挑战性。单细胞数据的噪声增加、显著稀疏性和多模态分布特性进一步加剧了这些问题。一种有前景的应对这些挑战的方法是基因集测试或通路分析。不幸的是,单细胞和批量转录组数据之间的统计和生物学差异使得难以将为批量组织分析开发的现有基因集用于scRNA-seq数据。在本文中,我们描述了一种定制最初为批量组织分析创建的基因集的程序,以反映特定细胞类型内基因活性的结构。我们的方法利用了通过人类蛋白质图谱(HPA)单细胞类型图谱通过scRNA-seq分析的81种人类细胞类型中的平均基因表达信息。这些HPA信息用于计算细胞类型特异性基因和基因集权重,可用于过滤或加权基因集。通过使用来自分子特征数据库(MSigDB)的基因集分析免疫细胞scRNA-seq数据表明,考虑细胞类型特异性可以显著提高基因集测试的功效和可解释性。