Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, USA.
Department of Genome Sciences, University of Washington, Seattle, USA.
Genome Biol. 2020 Mar 30;21(1):81. doi: 10.1186/s13059-020-01977-6.
The human epigenome has been experimentally characterized by thousands of measurements for every basepair in the human genome. We propose a deep neural network tensor factorization method, Avocado, that compresses this epigenomic data into a dense, information-rich representation. We use this learned representation to impute epigenomic data more accurately than previous methods, and we show that machine learning models that exploit this representation outperform those trained directly on epigenomic data on a variety of genomics tasks. These tasks include predicting gene expression, promoter-enhancer interactions, replication timing, and an element of 3D chromatin architecture.
人类表观基因组已经通过数千次实验对人类基因组中的每个碱基对进行了特征描述。我们提出了一种深度神经网络张量分解方法,即鳄梨(Avocado),可以将这些表观基因组数据压缩成密集、信息丰富的表示形式。我们使用这种学习到的表示形式来比以前的方法更准确地推断表观基因组数据,并且我们表明,利用这种表示形式的机器学习模型在各种基因组学任务上的表现优于直接在表观基因组数据上训练的模型。这些任务包括预测基因表达、启动子-增强子相互作用、复制时间和 3D 染色质结构的一个元素。