Department of Computer Science, Princeton University, Princeton, New Jersey 08540, USA.
Verily Life Sciences, Tel Aviv 6789141, Israel.
Genome Res. 2023 Jul;33(7):1124-1132. doi: 10.1101/gr.277670.123. Epub 2023 Aug 8.
Spatially resolved transcriptomics (SRT) technologies measure messenger RNA (mRNA) expression at thousands of locations in a tissue slice. However, nearly all SRT technologies measure expression in two-dimensional (2D) slices extracted from a 3D tissue, thus losing information that is shared across multiple slices from the same tissue. Integrating SRT data across multiple slices can help recover this information and improve downstream expression analyses, but multislice alignment and integration remains a challenging task. Existing methods for integrating SRT data either do not use spatial information or assume that the morphology of the tissue is largely preserved across slices, an assumption that is often violated because of biological or technical reasons. We introduce PASTE2, a method for alignment and 3D reconstruction of multislice SRT data sets, allowing only partial overlap between aligned slices and/or slice-specific cell types. PASTE2 formulates a novel fused Gromov-Wasserstein optimal transport problem, which we solve using a conditional gradient algorithm. PASTE2 includes a model selection procedure to estimate the fraction of overlap between slices, and optionally uses information from histological images that accompany some SRT experiments. We show on both simulated and real data that PASTE2 obtains more accurate alignments than existing methods. We further use PASTE2 to reconstruct a 3D map of gene expression in a embryo from a 16 slice Stereo-seq data set. PASTE2 produces accurate alignments of multislice data sets from multiple SRT technologies, enabling detailed studies of spatial gene expression across a wide range of biological applications.
空间分辨转录组学(SRT)技术可在组织切片的数千个位置测量信使 RNA(mRNA)的表达。然而,几乎所有 SRT 技术都在从 3D 组织提取的二维(2D)切片中测量表达水平,因此丢失了来自同一组织的多个切片之间共享的信息。整合来自多个切片的 SRT 数据可以帮助恢复这些信息并改善下游的表达分析,但多切片对齐和整合仍然是一项具有挑战性的任务。现有的整合 SRT 数据的方法要么不使用空间信息,要么假设组织的形态在多个切片中基本保持不变,但由于生物学或技术原因,这种假设经常被违反。我们引入了 PASTE2,这是一种用于多切片 SRT 数据集对齐和 3D 重建的方法,只允许对齐的切片之间有部分重叠,或者每个切片具有特定的细胞类型。PASTE2 提出了一个新颖的融合的 Gromov-Wasserstein 最优传输问题,并使用条件梯度算法来解决。PASTE2 包括一个模型选择过程来估计切片之间的重叠部分,并可选地使用伴随某些 SRT 实验的组织学图像的信息。我们在模拟和真实数据上都表明,PASTE2 获得了比现有方法更准确的对齐。我们进一步使用 PASTE2 从 16 个切片 Stereo-seq 数据集重建胚胎中的基因表达 3D 图谱。PASTE2 能够对来自多种 SRT 技术的多切片数据集进行准确对齐,从而能够在广泛的生物学应用中对空间基因表达进行详细研究。