Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, PA 15213.
Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA 15213.
Proc Natl Acad Sci U S A. 2024 Sep 10;121(37):e2316256121. doi: 10.1073/pnas.2316256121. Epub 2024 Sep 3.
Trajectory inference methods are essential for analyzing the developmental paths of cells in single-cell sequencing datasets. It provides insights into cellular differentiation, transitions, and lineage hierarchies, helping unravel the dynamic processes underlying development and disease progression. However, many existing tools lack a coherent statistical model and reliable uncertainty quantification, limiting their utility and robustness. In this paper, we introduce VITAE (Variational Inference for Trajectory by AutoEncoder), a statistical approach that integrates a latent hierarchical mixture model with variational autoencoders to infer trajectories. The statistical hierarchical model enhances the interpretability of our framework, while the posterior approximations generated by our variational autoencoder ensure computational efficiency and provide uncertainty quantification of cell projections along trajectories. Specifically, VITAE enables simultaneous trajectory inference and data integration, improving the accuracy of learning a joint trajectory structure in the presence of biological and technical heterogeneity across datasets. We show that VITAE outperforms other state-of-the-art trajectory inference methods on both real and synthetic data under various trajectory topologies. Furthermore, we apply VITAE to jointly analyze three distinct single-cell RNA sequencing datasets of the mouse neocortex, unveiling comprehensive developmental lineages of projection neurons. VITAE effectively reduces batch effects within and across datasets and uncovers finer structures that might be overlooked in individual datasets. Additionally, we showcase VITAE's efficacy in integrative analyses of multiomic datasets with continuous cell population structures.
轨迹推断方法对于分析单细胞测序数据集中细胞的发育路径至关重要。它提供了对细胞分化、转变和谱系层次结构的深入了解,有助于揭示发育和疾病进展背后的动态过程。然而,许多现有的工具缺乏一致的统计模型和可靠的不确定性量化,限制了它们的实用性和稳健性。在本文中,我们介绍了 VITAE(通过自动编码器的变分推理轨迹),这是一种统计方法,它将潜在的层次混合模型与变分自动编码器集成在一起,以推断轨迹。统计层次模型增强了我们框架的可解释性,而我们的变分自动编码器生成的后验近似确保了计算效率,并提供了细胞沿轨迹投影的不确定性量化。具体来说,VITAE 能够同时进行轨迹推断和数据集成,提高了在存在跨数据集的生物学和技术异质性的情况下学习联合轨迹结构的准确性。我们表明,VITAE 在各种轨迹拓扑结构下,无论是在真实数据还是合成数据上,都优于其他最先进的轨迹推断方法。此外,我们将 VITAE 应用于联合分析三个不同的小鼠新皮层单细胞 RNA 测序数据集,揭示了投射神经元的全面发育谱系。VITAE 有效地减少了数据集内和数据集之间的批次效应,并揭示了在单个数据集中可能被忽略的更精细结构。此外,我们展示了 VITAE 在具有连续细胞群体结构的多组学数据集的综合分析中的功效。