Fang Meichen, Gorin Gennady, Pachter Lior
Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California, United States of America.
Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California, United States of America.
PLoS Comput Biol. 2025 Jan 21;21(1):e1012752. doi: 10.1371/journal.pcbi.1012752. eCollection 2025 Jan.
Single-cell transcriptomics experiments provide gene expression snapshots of heterogeneous cell populations across cell states. These snapshots have been used to infer trajectories and dynamic information even without intensive, time-series data by ordering cells according to gene expression similarity. However, while single-cell snapshots sometimes offer valuable insights into dynamic processes, current methods for ordering cells are limited by descriptive notions of "pseudotime" that lack intrinsic physical meaning. Instead of pseudotime, we propose inference of "process time" via a principled modeling approach to formulating trajectories and inferring latent variables corresponding to timing of cells subject to a biophysical process. Our implementation of this approach, called Chronocell, provides a biophysical formulation of trajectories built on cell state transitions. The Chronocell model is identifiable, making parameter inference meaningful. Furthermore, Chronocell can interpolate between trajectory inference, when cell states lie on a continuum, and clustering, when cells cluster into discrete states. By using a variety of datasets ranging from cluster-like to continuous, we show that Chronocell enables us to assess the suitability of datasets and reveals distinct cellular distributions along process time that are consistent with biological process times. We also compare our parameter estimates of degradation rates to those derived from metabolic labeling datasets, thereby showcasing the biophysical utility of Chronocell. Nevertheless, based on performance characterization on simulations, we find that process time inference can be challenging, highlighting the importance of dataset quality and careful model assessment.
单细胞转录组学实验提供了跨细胞状态的异质细胞群体的基因表达快照。即使没有密集的时间序列数据,这些快照也已被用于通过根据基因表达相似性对细胞进行排序来推断轨迹和动态信息。然而,虽然单细胞快照有时能为动态过程提供有价值的见解,但目前用于细胞排序的方法受到缺乏内在物理意义的“伪时间”描述概念的限制。我们提出通过一种有原则的建模方法来推断“过程时间”,以形成轨迹并推断与受生物物理过程影响的细胞时间相对应的潜在变量,而不是使用伪时间。我们将这种方法称为Chronocell,它基于细胞状态转换提供了一种生物物理轨迹公式。Chronocell模型是可识别的,这使得参数推断具有意义。此外,当细胞状态处于连续统时,Chronocell可以在轨迹推断和聚类之间进行插值,当细胞聚集成离散状态时进行聚类。通过使用从类簇到连续的各种数据集,我们表明Chronocell使我们能够评估数据集的适用性,并揭示沿过程时间的不同细胞分布,这些分布与生物过程时间一致。我们还将降解率的参数估计与从代谢标记数据集中得出的估计进行了比较,从而展示了Chronocell的生物物理效用。然而,基于对模拟的性能表征,我们发现过程时间推断可能具有挑战性,这突出了数据集质量和仔细的模型评估的重要性。