Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan 250061, China.
Advanced Modeling and Applied Computing Laboratory, Department of Mathematics, The University of Hong Kong, Hong Kong, China.
Bioinformatics. 2023 May 4;39(5). doi: 10.1093/bioinformatics/btad256.
From a systematic perspective, it is crucial to infer and analyze gene regulatory network (GRN) from high-throughput single-cell RNA sequencing data. However, most existing GRN inference methods mainly focus on the network topology, only few of them consider how to explicitly describe the updated logic rules of regulation in GRNs to obtain their dynamics. Moreover, some inference methods also fail to deal with the over-fitting problem caused by the noise in time series data.
In this article, we propose a novel embedded Boolean threshold network method called LogBTF, which effectively infers GRN by integrating regularized logistic regression and Boolean threshold function. First, the continuous gene expression values are converted into Boolean values and the elastic net regression model is adopted to fit the binarized time series data. Then, the estimated regression coefficients are applied to represent the unknown Boolean threshold function of the candidate Boolean threshold network as the dynamical equations. To overcome the multi-collinearity and over-fitting problems, a new and effective approach is designed to optimize the network topology by adding a perturbation design matrix to the input data and thereafter setting sufficiently small elements of the output coefficient vector to zeros. In addition, the cross-validation procedure is implemented into the Boolean threshold network model framework to strengthen the inference capability. Finally, extensive experiments on one simulated Boolean value dataset, dozens of simulation datasets, and three real single-cell RNA sequencing datasets demonstrate that the LogBTF method can infer GRNs from time series data more accurately than some other alternative methods for GRN inference.
The source data and code are available at https://github.com/zpliulab/LogBTF.
从系统的角度来看,从高通量单细胞 RNA 测序数据中推断和分析基因调控网络 (GRN) 至关重要。然而,大多数现有的 GRN 推断方法主要关注网络拓扑结构,只有少数方法考虑如何明确描述 GRNs 中调节的更新逻辑规则,以获得它们的动态特性。此外,一些推断方法也无法处理由时间序列数据中的噪声引起的过拟合问题。
在本文中,我们提出了一种新的嵌入式布尔阈值网络方法,称为 LogBTF,它通过整合正则化逻辑回归和布尔阈值函数有效地推断 GRN。首先,将连续的基因表达值转换为布尔值,并采用弹性网络回归模型拟合二值化时间序列数据。然后,将估计的回归系数应用于表示候选布尔阈值网络的未知布尔阈值函数,作为动态方程。为了克服多重共线性和过拟合问题,我们设计了一种新的有效方法,通过向输入数据添加扰动设计矩阵,并将输出系数向量的足够小元素置零,来优化网络拓扑结构。此外,交叉验证过程被应用到布尔阈值网络模型框架中,以增强推断能力。最后,在一个模拟布尔值数据集、几十个模拟数据集和三个真实的单细胞 RNA 测序数据集上进行的广泛实验表明,与其他一些 GRN 推断的替代方法相比,LogBTF 方法可以更准确地从时间序列数据中推断出 GRN。
原始数据和代码可在 https://github.com/zpliulab/LogBTF 上获得。