Center of Intelligent Medicine, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China.
Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae286.
Reconstructing the topology of gene regulatory network from gene expression data has been extensively studied. With the abundance functional transcriptomic data available, it is now feasible to systematically decipher regulatory interaction dynamics in a logic form such as a Boolean network (BN) framework, which qualitatively indicates how multiple regulators aggregated to affect a common target gene. However, inferring both the network topology and gene interaction dynamics simultaneously is still a challenging problem since gene expression data are typically noisy and data discretization is prone to information loss. We propose a new method for BN inference from time-series transcriptional profiles, called LogicGep. LogicGep formulates the identification of Boolean functions as a symbolic regression problem that learns the Boolean function expression and solve it efficiently through multi-objective optimization using an improved gene expression programming algorithm. To avoid overly emphasizing dynamic characteristics at the expense of topology structure ones, as traditional methods often do, a set of promising Boolean formulas for each target gene is evolved firstly, and a feed-forward neural network trained with continuous expression data is subsequently employed to pick out the final solution. We validated the efficacy of LogicGep using multiple datasets including both synthetic and real-world experimental data. The results elucidate that LogicGep adeptly infers accurate BN models, outperforming other representative BN inference algorithms in both network topology reconstruction and the identification of Boolean functions. Moreover, the execution of LogicGep is hundreds of times faster than other methods, especially in the case of large network inference.
从基因表达数据中重建基因调控网络的拓扑结构已经得到了广泛的研究。随着丰富的功能转录组数据的可用性,现在可以系统地以逻辑形式(如布尔网络(BN)框架)来破译调节相互作用的动态,这定性地表明了多个调节剂如何聚集在一起影响共同的靶基因。然而,同时推断网络拓扑结构和基因相互作用的动态仍然是一个具有挑战性的问题,因为基因表达数据通常是嘈杂的,并且数据离散化容易导致信息丢失。我们提出了一种从时间序列转录谱中推断 BN 的新方法,称为 LogicGep。LogicGep 将布尔函数的识别表述为一个符号回归问题,通过使用改进的基因表达编程算法进行多目标优化来学习布尔函数表达式并有效地解决它。为了避免像传统方法那样过于强调动态特性而牺牲拓扑结构特性,LogicGep 首先为每个目标基因进化出一组有前途的布尔公式,然后使用连续表达数据训练前馈神经网络来挑选最终的解决方案。我们使用包括合成和真实实验数据在内的多个数据集验证了 LogicGep 的功效。结果表明,LogicGep 能够巧妙地推断出准确的 BN 模型,在网络拓扑结构重建和布尔函数识别方面均优于其他代表性的 BN 推断算法。此外,LogicGep 的执行速度比其他方法快数百倍,尤其是在大型网络推断的情况下。