Laboratoire de physique de l'École normale supérieure, CNRS, PSL University, Sorbonne Université and Université de Paris, Paris, France.
Saber Bio SAS, Institut du Cerveau, iPEPS The Healthtech Hub, Paris, France.
Elife. 2024 Aug 9;13:e86181. doi: 10.7554/eLife.86181.
B-cell repertoires are characterized by a diverse set of receptors of distinct specificities generated through two processes of somatic diversification: V(D)J recombination and somatic hypermutations. B-cell clonal families stem from the same V(D)J recombination event, but differ in their hypermutations. Clonal families identification is key to understanding B-cell repertoire function, evolution, and dynamics. We present HILARy (high-precision inference of lineages in antibody repertoires), an efficient, fast, and precise method to identify clonal families from single- or paired-chain repertoire sequencing datasets. HILARy combines probabilistic models that capture the receptor generation and selection statistics with adapted clustering methods to achieve consistently high inference accuracy. It automatically leverages the phylogenetic signal of shared mutations in difficult repertoire subsets. Exploiting the high sensitivity of the method, we find the statistics of evolutionary properties such as the site frequency spectrum and / ratio do not depend on the junction length. We also identify a broad range of selection pressures spanning two orders of magnitude.
B 细胞受体库的特征是通过两种体细胞多样化过程产生的具有不同特异性的多样化受体:V(D)J 重组和体细胞超突变。B 细胞克隆家族源自相同的 V(D)J 重组事件,但在超突变方面存在差异。克隆家族的鉴定是理解 B 细胞受体库功能、进化和动态的关键。我们提出了 HILARy(抗体库中谱系的高精度推断),这是一种从单链或双链受体库测序数据集中识别克隆家族的高效、快速和精确方法。HILARy 结合了捕获受体生成和选择统计的概率模型,以及适应的聚类方法,以实现始终如一的高精度推断。它自动利用了在困难的受体亚集中共享突变的系统发育信号。利用该方法的高灵敏度,我们发现进化特性(如位点频率谱和 / 比)的统计数据不依赖于连接长度。我们还确定了跨越两个数量级的广泛的选择压力。