Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA.
Fauna Bio, Emeryville, CA, USA.
Nat Comput Sci. 2024 Sep;4(9):677-689. doi: 10.1038/s43588-024-00689-2. Epub 2024 Sep 20.
Multimodal, single-cell genomics technologies enable simultaneous measurement of multiple facets of DNA and RNA processing in the cell. This creates opportunities for transcriptome-wide, mechanistic studies of cellular processing in heterogeneous cell populations, such as regulation of cell fate by transcriptional stochasticity or tumor proliferation through aberrant splicing dynamics. However, current methods for determining cell types or 'clusters' in multimodal data often rely on ad hoc approaches to balance or integrate measurements, and assumptions ignoring inherent properties of the data. To enable interpretable and consistent cell cluster determination, we present meK-means (mechanistic K-means) which integrates modalities through a unifying model of transcription to learn underlying, shared biophysical states. With meK-means we can cluster cells with nascent and mature mRNA measurements, utilizing the causal, physical relationships between these modalities. This identifies shared transcription dynamics across cells, which induce the observed molecule counts, and provides an alternative definition for 'clusters' through the governing parameters of cellular processes.
多模态单细胞基因组学技术能够同时测量细胞中 DNA 和 RNA 处理的多个方面。这为在异质细胞群体中进行转录组范围的、基于机制的细胞处理研究提供了机会,例如通过转录随机性来调节细胞命运,或者通过异常剪接动力学来促进肿瘤增殖。然而,目前用于确定多模态数据中细胞类型或“簇”的方法通常依赖于特定的方法来平衡或整合测量结果,并且忽略了数据固有的属性。为了实现可解释和一致的细胞簇确定,我们提出了 meK-means(基于机制的 K 均值),它通过转录的统一模型来整合模态,以学习潜在的、共享的生物物理状态。使用 meK-means,我们可以对带有新生和成熟 mRNA 测量值的细胞进行聚类,利用这些模态之间的因果、物理关系。这可以识别跨细胞的共享转录动态,这些动态会导致观察到的分子计数,并通过细胞过程的控制参数为“簇”提供替代定义。