Gorin Gennady, Pachter Lior
Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California.
Division of Biology and Biological Engineering, Pasadena, California.
Biophys Rep (N Y). 2022 Dec 27;3(1):100097. doi: 10.1016/j.bpr.2022.100097. eCollection 2023 Mar 8.
Single-cell RNA sequencing data can be modeled using Markov chains to yield genome-wide insights into transcriptional physics. However, quantitative inference with such data requires careful assessment of noise sources. We find that long pre-mRNA transcripts are over-represented in sequencing data. To explain this trend, we propose a length-based model of capture bias, which may produce false-positive observations. We solve this model and use it to find concordant parameter trends as well as systematic, mechanistically interpretable technical and biological differences in paired data sets.
单细胞RNA测序数据可以使用马尔可夫链进行建模,以在全基因组范围内深入了解转录物理学。然而,对这类数据进行定量推断需要仔细评估噪声源。我们发现长的前体mRNA转录本在测序数据中过度呈现。为了解释这一趋势,我们提出了一种基于长度的捕获偏差模型,该模型可能会产生假阳性观察结果。我们求解了这个模型,并使用它来发现一致的参数趋势以及配对数据集中系统的、具有机械可解释性的技术和生物学差异。