Jin Ke, Li Bo, Yan Hong, Zhang Xiao-Fei
Department of Statistics, School of Mathematics and Statistics, Central China Normal University, Wuhan 430079, China.
Hubei Key Laboratory of Mathematical Sciences, Central China Normal University, Wuhan 430079, China.
Bioinformatics. 2022 Jun 13;38(12):3222-3230. doi: 10.1093/bioinformatics/btac300.
Single-cell RNA sequencing (scRNA-seq) technologies have been testified revolutionary for their promotion on the profiling of single-cell transcriptomes at single-cell resolution. Excess zeros due to various technical noises, called dropouts, will mislead downstream analyses. Therefore, it is crucial to have accurate imputation methods to address the dropout problem.
In this article, we develop a new dropout imputation method for scRNA-seq data based on multi-objective optimization. Our method is different from existing ones, which assume that the underlying data has a preconceived structure and impute the dropouts according to the information learned from such structure. We assume that the data combines three types of latent structures, including the horizontal structure (genes are similar to each other), the vertical structure (cells are similar to each other) and the low-rank structure. The combination weights and latent structures are learned using multi-objective optimization. And, the weighted average of the observed data and the imputation results learned from the three types of structures are considered as the final result. Comprehensive downstream experiments show the superiority of our method in terms of recovery of true gene expression profiles, differential expression analysis, cell clustering and cell trajectory inference.
The R package is available at https://github.com/Zhangxf-ccnu/scMOO and https://zenodo.org/record/5785195. The codes to reproduce the downstream analyses in this article can be found at https://github.com/Zhangxf-ccnu/scMOO_experiments_codes and https://zenodo.org/record/5786211. The detailed list of data sets used in the present study is represented in Supplementary Table S1 in the Supplementary materials.
Supplementary data are available at Bioinformatics online.
单细胞RNA测序(scRNA-seq)技术因其在单细胞分辨率下促进单细胞转录组分析而具有革命性。由于各种技术噪声导致的过多零值(称为缺失值)会误导下游分析。因此,拥有准确的插补方法来解决缺失值问题至关重要。
在本文中,我们基于多目标优化开发了一种用于scRNA-seq数据的新型缺失值插补方法。我们的方法与现有方法不同,现有方法假设基础数据具有预先设定的结构,并根据从该结构中学到的信息来插补缺失值。我们假设数据结合了三种类型的潜在结构,包括水平结构(基因彼此相似)、垂直结构(细胞彼此相似)和低秩结构。使用多目标优化来学习组合权重和潜在结构。并且,将观测数据与从三种结构中学到的插补结果的加权平均值视为最终结果。全面的下游实验表明,我们的方法在恢复真实基因表达谱、差异表达分析、细胞聚类和细胞轨迹推断方面具有优势。
补充数据可在《生物信息学》在线获取。