Human Genetics Unit, Indian Statistical Institute, 203 B. T. Road, Kolkata 700108, West Bengal, India.
Nucleic Acids Res. 2021 Aug 20;49(14):7909-7924. doi: 10.1093/nar/gkab457.
Dynamic regulation of gene expression is often governed by progression through transient cell states. Bulk RNA-seq analysis can only detect average change in expression levels and is unable to identify this dynamics. Single cell RNA-seq presents an unprecedented opportunity that helps in placing the cells on a hypothetical time trajectory that reflects gradual transition of their transcriptomes. This continuum trajectory or 'pseudotime', may reveal the developmental pathway and provide us with information on dynamic transcriptomic changes and other biological processes. Existing approaches to build pseudotime heavily depend on reducing huge dimension to extremely low dimensional subspaces and may lead to loss of information. We propose PseudoGA, a genetic algorithm based approach to order cells assuming that gene expressions vary according to a smooth curve along the pseudotime trajectory. We observe superior accuracy of our method in simulated as well as benchmarking real datasets. Generality of the assumption behind PseudoGA and no dependence on dimensionality reduction technique make it a robust choice for pseudotime estimation from single cell transcriptome data. PseudoGA is also time efficient when applied to a large single cell RNA-seq data and adaptable to parallel computing. R code for PseudoGA is freely available at https://github.com/indranillab/pseudoga.
基因表达的动态调控通常受细胞状态瞬时变化的控制。批量 RNA-seq 分析只能检测到表达水平的平均变化,而无法识别这种动态变化。单细胞 RNA-seq 提供了一个前所未有的机会,可以帮助我们将细胞置于一个假设的时间轨迹上,反映它们转录组的逐渐转变。这条连续的轨迹或“伪时间”,可以揭示发育途径,并为我们提供有关动态转录组变化和其他生物学过程的信息。现有的构建伪时间的方法严重依赖于将巨大的维度降低到极低的子空间,这可能会导致信息丢失。我们提出了 PseudoGA,这是一种基于遗传算法的方法,可以对细胞进行排序,假设基因表达根据伪时间轨迹上的平滑曲线而变化。我们观察到,我们的方法在模拟和基准真实数据集上都具有更高的准确性。PseudoGA 背后的假设的通用性和对降维技术的不依赖使其成为从单细胞转录组数据中估计伪时间的可靠选择。当应用于大型单细胞 RNA-seq 数据时,PseudoGA 也是高效的,并且可以适应并行计算。PseudoGA 的 R 代码可在 https://github.com/indranillab/pseudoga 上免费获得。