Algabri Yousif A, Li Lingyu, Liu Zhi-Ping
Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan 250061, China.
Bioengineering (Basel). 2022 Jul 30;9(8):353. doi: 10.3390/bioengineering9080353.
Single-cell RNA-sequencing (scRNA-seq) is a recent high-throughput technique that can measure gene expression, reveal cell heterogeneity, rare and complex cell populations, and discover cell types and their relationships. The analysis of scRNA-seq data is challenging because of transcripts sparsity, replication noise, and outlier cell populations. A gene coexpression network (GCN) analysis effectively deciphers phenotypic differences in specific states by describing gene-gene pairwise relationships. The underlying gene modules with different coexpression patterns partially bridge the gap between genotype and phenotype. This study presents a new framework called scGENA (single-cell gene coexpression network analysis) for GCN analysis based on scRNA-seq data. Although there are several methods for scRNA-seq data analysis, we aim to build an integrative pipeline for several purposes that cover primary data preprocessing, including data exploration, quality control, normalization, imputation, and dimensionality reduction of clustering as downstream of GCN analysis. To demonstrate this integrated workflow, an scRNA-seq dataset of the human diabetic pancreas with 1600 cells and 39,851 genes was implemented to perform all these processes in practice. As a result, scGENA is demonstrated to uncover interesting gene modules behind complex diseases, which reveal biological mechanisms. scGENA provides a state-of-the-art method for gene coexpression analysis for scRNA-seq data.
单细胞RNA测序(scRNA-seq)是一种最新的高通量技术,它能够测量基因表达、揭示细胞异质性、稀有和复杂的细胞群体,并发现细胞类型及其关系。由于转录本稀疏、复制噪声和异常细胞群体,scRNA-seq数据分析具有挑战性。基因共表达网络(GCN)分析通过描述基因-基因对关系有效地解读特定状态下的表型差异。具有不同共表达模式的潜在基因模块部分地弥合了基因型和表型之间的差距。本研究提出了一种名为scGENA(单细胞基因共表达网络分析)的新框架,用于基于scRNA-seq数据的GCN分析。尽管有几种scRNA-seq数据分析方法,但我们旨在构建一个综合流程,用于多种目的,包括原始数据预处理,如数据探索、质量控制、归一化、插补以及作为GCN分析下游的聚类降维。为了演示这个集成工作流程,我们使用了一个包含1600个细胞和39851个基因的人类糖尿病胰腺scRNA-seq数据集来实际执行所有这些过程。结果表明,scGENA能够揭示复杂疾病背后有趣的基因模块,这些模块揭示了生物学机制。scGENA为scRNA-seq数据的基因共表达分析提供了一种先进的方法。