Shi Qianqian, Zhang Chuanchao, Guo Weifeng, Zeng Tao, Lu Lina, Jiang Zhonglin, Wang Ziming, Liu Juan, Chen Luonan
Key Laboratory of Systems Biology, CAS Center for Excellence in Molecular Cell Science, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences; University of Chinese Academy of Sciences, Shanghai, China.
State Key Laboratory of Software Engineering, School of Computer, Wuhan University, Wuhan, China.
Methods. 2017 Jul 15;124:25-35. doi: 10.1016/j.ymeth.2017.06.018. Epub 2017 Jul 12.
Transcription factors (TFs) could regulate physiological transitions or determine stable phenotypic diversity. The accurate estimation on TF regulatory signals or functional activities is of great significance to guide biological experiments or elucidate molecular mechanisms, but still remains challenging. Traditional methods identify TF regulatory signals at the population level, which masks heterogeneous regulation mechanisms in individuals or subgroups, thus resulting in inaccurate analyses. Here, we propose a novel computational framework, namely local network component analysis (LNCA), to exploit data heterogeneity and automatically quantify accurate transcription factor activity (TFA) in practical terms, through integrating the partitioned expression sets (i.e., local information) and prior TF-gene regulatory knowledge. Specifically, LNCA adopts an adaptive optimization strategy, which evaluates the local similarities of regulation controls and corrects biases during data integration, to construct the TFA landscape. In particular, we first numerically demonstrate the effectiveness of LNCA for the simulated data sets, compared with traditional methods, such as FastNCA, ROBNCA and NINCA. Then, we apply our model to two real data sets with implicit temporal or spatial regulation variations. The results show that LNCA not only recognizes the periodic mode along the S. cerevisiae cell cycle process, but also substantially outperforms over other methods in terms of accuracy and consistency. In addition, the cross-validation study for glioblastomas multiforme (GBM) indicates that the TFAs, identified by LNCA, can better distinguish clinically distinct tumor groups than the expression values of the corresponding TFs, thus opening a new way to classify tumor subtypes and also providing a novel insight into cancer heterogeneity.
LNCA was implemented as a Matlab package, which is available at http://sysbio.sibcb.ac.cn/cb/chenlab/software.htm/LNCApackage_0.1.rar.
转录因子(TFs)可调节生理转变或决定稳定的表型多样性。准确估计TF调控信号或功能活性对于指导生物学实验或阐明分子机制具有重要意义,但仍然具有挑战性。传统方法在群体水平上识别TF调控信号,这掩盖了个体或亚组中的异质调控机制,从而导致分析不准确。在这里,我们提出了一种新颖的计算框架,即局部网络成分分析(LNCA),通过整合分区表达集(即局部信息)和先前的TF-基因调控知识,在实际中利用数据异质性并自动量化准确的转录因子活性(TFA)。具体而言,LNCA采用自适应优化策略,该策略在数据整合过程中评估调控控制的局部相似性并校正偏差,以构建TFA景观。特别是,我们首先通过数值证明了LNCA对于模拟数据集的有效性,与传统方法如FastNCA、ROBNCA和NINCA进行了比较。然后,我们将我们的模型应用于两个具有隐含时间或空间调控变化的真实数据集。结果表明,LNCA不仅识别出酿酒酵母细胞周期过程中的周期性模式,而且在准确性和一致性方面也明显优于其他方法。此外,对多形性胶质母细胞瘤(GBM)的交叉验证研究表明,由LNCA识别的TFAs比相应TF的表达值能更好地区分临床上不同的肿瘤组,从而为肿瘤亚型分类开辟了一条新途径,并为癌症异质性提供了新的见解。
LNCA作为一个Matlab包实现,可在http://sysbio.sibcb.ac.cn/cb/chenlab/software.htm/LNCApackage_0.1.rar获得。