Cieslak Matthew C, Castelfranco Ann M, Roncalli Vittoria, Lenz Petra H, Hartline Daniel K
Pacific Biosciences Research Center, University of Hawai'i at Mānoa, 1993 East-West Rd., Honolulu, HI 96822, USA.
Pacific Biosciences Research Center, University of Hawai'i at Mānoa, 1993 East-West Rd., Honolulu, HI 96822, USA; Department of Genetics, Microbiology and Statistics, Facultat de Biologia, IRBio, Universitat de Barcelona, Av. Diagonal 643, 08028 Barcelona, Spain.
Mar Genomics. 2020 Jun;51:100723. doi: 10.1016/j.margen.2019.100723. Epub 2019 Nov 26.
High-throughput RNA sequencing (RNA-Seq) has transformed the ecophysiological assessment of individual plankton species and communities. However, the technology generates complex data consisting of millions of short-read sequences that can be difficult to analyze and interpret. New bioinformatics workflows are needed to guide experimentation, environmental sampling, and to develop and test hypotheses. One complexity-reducing tool that has been used successfully in other fields is "t-distributed Stochastic Neighbor Embedding" (t-SNE). Its application to transcriptomic data from marine pelagic and benthic systems has yet to be explored. The present study demonstrates an application for evaluating RNA-Seq data using previously published, conventionally analyzed studies on the copepods Calanus finmarchicus and Neocalanus flemingeri. In one application, gene expression profiles were compared among different developmental stages. In another, they were compared among experimental conditions. In a third, they were compared among environmental samples from different locations. The profile categories identified by t-SNE were validated by reference to published results using differential gene expression and Gene Ontology (GO) analyses. The analyses demonstrate how individual samples can be evaluated for differences in global gene expression, as well as differences in expression related to specific biological processes, such as lipid metabolism and responses to stress. As RNA-Seq data from plankton species and communities become more common, t-SNE analysis should provide a powerful tool for determining trends and classifying samples into groups with similar transcriptional physiology, independent of collection site or time.
高通量RNA测序(RNA-Seq)已经改变了对单个浮游生物物种和群落的生态生理学评估。然而,该技术产生的复杂数据由数百万条短读长序列组成,可能难以分析和解释。需要新的生物信息学工作流程来指导实验、环境采样以及提出和检验假设。“t分布随机邻域嵌入”(t-SNE)是一种已在其他领域成功使用的降维工具。其在海洋浮游和底栖系统转录组数据中的应用尚未得到探索。本研究展示了一种使用先前发表的、对哲水蚤(Calanus finmarchicus)和弗氏新哲水蚤(Neocalanus flemingeri)进行常规分析的研究来评估RNA-Seq数据的应用。在一个应用中,比较了不同发育阶段的基因表达谱。在另一个应用中,比较了不同实验条件下的基因表达谱。在第三个应用中,比较了来自不同地点的环境样本中的基因表达谱。通过使用差异基因表达和基因本体(GO)分析参考已发表的结果,验证了由t-SNE识别的谱类别。这些分析展示了如何评估单个样本在整体基因表达上的差异,以及与特定生物学过程(如脂质代谢和应激反应)相关的表达差异。随着来自浮游生物物种和群落的RNA-Seq数据变得更加普遍,t-SNE分析应该为确定趋势以及将样本分类为具有相似转录生理学的组提供一种强大的工具,而与采集地点或时间无关。