Fang Huaying, Huang Chengcheng, Zhao Hongyu, Deng Minghua
1 LMAM, School of Mathematical Sciences, Peking University , Beijing, China .
2 Center for Quantitative Biology, Peking University , Beijing, China .
J Comput Biol. 2017 Jul;24(7):699-708. doi: 10.1089/cmb.2017.0054. Epub 2017 May 10.
The increasing quality and the reducing cost of high-throughput sequencing technologies for 16S rRNA gene profiling enable researchers to directly analyze microbe communities in natural environments. The direct interactions among microbial species of a given ecological system can help us understand the principles of community assembly and maintenance under various conditions. Compositionality and dimensionality of microbiome data are two main challenges for inferring the direct interaction network of microbes. In this article, we use the logistic normal distribution to model the background mechanism of microbiome data, which can appropriately deal with the compositional nature of the data. The direct interaction relationships are then modeled via the conditional dependence network under this logistic normal assumption. We then propose a novel penalized maximum likelihood method called gCoda to estimate the sparse structure of inverse covariance for latent normal variables to address the high dimensionality of the microbiome data. An effective Majorization-Minimization algorithm is proposed to solve the optimization problem in gCoda. Simulation studies show that gCoda outperforms existing methods (e.g., SPIEC-EASI) in edge recovery of inverse covariance for compositional data under a variety of scenarios. gCoda also performs better than SPIEC-EASI for inferring direct microbial interactions of mouse skin microbiome data.
用于16S rRNA基因谱分析的高通量测序技术质量不断提高,成本不断降低,这使得研究人员能够直接分析自然环境中的微生物群落。特定生态系统中微生物物种之间的直接相互作用有助于我们理解在各种条件下群落组装和维持的原理。微生物组数据的构成性和维度性是推断微生物直接相互作用网络的两个主要挑战。在本文中,我们使用逻辑正态分布对微生物组数据的背景机制进行建模,该模型能够适当地处理数据的构成性质。然后在这种逻辑正态假设下,通过条件依赖网络对直接相互作用关系进行建模。接着,我们提出了一种名为gCoda的新型惩罚最大似然方法,用于估计潜在正态变量的逆协方差的稀疏结构,以解决微生物组数据的高维度问题。我们还提出了一种有效的逐次逼近最小化算法来解决gCoda中的优化问题。模拟研究表明,在各种情况下,gCoda在恢复构成性数据的逆协方差的边方面优于现有方法(例如SPIEC-EASI)。在推断小鼠皮肤微生物组数据的直接微生物相互作用方面,gCoda也比SPIEC-EASI表现更好。