da Silva José Eduardo H, Bernardino Heder S, de Oliveira Itamar L, Camata José J
Universidade Federal de Juiz de Fora, Rua José Lourenço Kelmer, s/n, Juiz de Fora, 36036-900, Minas Gerais, Brazil.
Universidade Federal de Juiz de Fora, Rua José Lourenço Kelmer, s/n, Juiz de Fora, 36036-900, Minas Gerais, Brazil.
Biosystems. 2025 Jul;253:105464. doi: 10.1016/j.biosystems.2025.105464. Epub 2025 May 21.
The advent of scRNA-Seq sequencing technology has provided unprecedented resolutions in the analysis of gene regulatory networks (GRNs) at the single-cell level. However, new technical and methodological challenges also emerged. Factors such as the large number of zeros reported in expression levels, the biological variation due to the stochastic nature of gene expression, environmental niche, and effects created by the cell cycle make it difficult to correctly interpret the data obtained in the sequencing stage. On the other hand, the development of methods for the inference of GRNs, specifically using scRNA-Seq technology, proved to be of similar quality to random predictors. The lack of adequate pre-processing of gene expression data, including selection steps for subsets of genes of interest, smoothing, and discretization of gene expression, in addition to the different ways of modeling networks and network motifs, are factors that affect the performance of inference approaches. Finally, the lack of knowledge about the ground-truth network and the non-standardization of appropriate metrics to measure the quality of inferred networks make the process of comparing performance between algorithms a major problem, given the unbalanced nature of the data and the interpretation bias caused by the chosen metric. This article brings these issues to light, aiming to show how these factors influence both the inference process and the performance evaluation of inferred networks, through comparative computational experiments and provides suggestions for a more robust methodological process for researchers dealing with inference of GRNs.
单细胞RNA测序(scRNA-Seq)技术的出现,为在单细胞水平分析基因调控网络(GRN)提供了前所未有的分辨率。然而,新的技术和方法挑战也随之出现。诸如表达水平中大量零值的存在、由于基因表达的随机性、环境生态位以及细胞周期产生的影响所导致的生物学变异等因素,使得正确解读测序阶段获得的数据变得困难。另一方面,用于推断GRN的方法,特别是使用scRNA-Seq技术的方法,其质量被证明与随机预测器相当。除了网络和网络基序建模的不同方式外,基因表达数据缺乏充分的预处理,包括对感兴趣基因子集的选择步骤、基因表达的平滑处理和离散化,都是影响推断方法性能的因素。最后,鉴于数据的不平衡性质以及所选度量标准导致的解释偏差,缺乏关于真实网络的知识以及衡量推断网络质量的适当度量标准的非标准化,使得比较算法之间的性能成为一个主要问题。本文揭示了这些问题,旨在通过比较计算实验展示这些因素如何影响推断过程和推断网络的性能评估,并为处理GRN推断的研究人员提供一个更稳健的方法过程的建议。