Suppr超能文献

基于等压标记的相对定量合并数据集中的数据插补

Data Imputation in Merged Isobaric Labeling-Based Relative Quantification Datasets.

作者信息

Palstrøm Nicolai Bjødstrup, Matthiesen Rune, Beck Hans Christian

机构信息

Department of Clinical Biochemistry and Pharmacology, Odense University Hospital, Odense C, Denmark.

Computational and Experimental Biology Group, CEDOC, Chronic Diseases Research Centre, NOVA Medical School, Faculdade de Ciências Médicas, Universidade NOVA de Lisboa, Lisbon, Portugal.

出版信息

Methods Mol Biol. 2020;2051:297-308. doi: 10.1007/978-1-4939-9744-2_13.

Abstract

The data-dependent acquisition in mass spectrometry-based proteomics combined with quantitative analysis using isobaric labeling (iTRAQ and TMT) inevitably introduces missing values in proteomic experiments where a number of LC-runs are combined, especially in the growing field of shotgun clinical proteomics, where the protein profiles from the proteomics analysis of several hundred patient samples are compared and correlated to clinical traits such as a specific disease or disease treatment in order to link specific outcomes to one or more proteins. In the context of clinical research it is evident that missing values in such datasets reduce the power of the downstream statistical analysis therefore may hampers the linking of the expression of disease traits to the expression of specific proteins that may be useful for prognostic, diagnostic, or predictive purposes. In our study, we tested three data imputation approaches initially developed for microarray data for the imputation of missing values in datasets that are generated by several runs of shotgun proteomic experiments and where the data were relative protein abundances based on isobaric tags (iTRAQ and TMT). Our conclusion is that imputation methods based on k Nearest Neighbors successfully impute missing values in datasets with up to 50% missing values.

摘要

基于质谱的蛋白质组学中依赖数据的采集与使用等压标记(iTRAQ和TMT)的定量分析相结合,在将多个液相色谱运行结果合并的蛋白质组学实验中不可避免地会引入缺失值,尤其是在快速发展的鸟枪法临床蛋白质组学领域,在该领域中,要对数百份患者样本的蛋白质组学分析得到的蛋白质谱进行比较,并与诸如特定疾病或疾病治疗等临床特征相关联,以便将特定结果与一种或多种蛋白质联系起来。在临床研究背景下,很明显此类数据集中的缺失值会降低下游统计分析的效能,因此可能会阻碍将疾病特征的表达与可能用于预后、诊断或预测目的的特定蛋白质的表达联系起来。在我们的研究中,我们测试了最初为微阵列数据开发的三种数据插补方法,用于插补由多次鸟枪法蛋白质组学实验生成的数据集中的缺失值,这些数据是基于等压标签(iTRAQ和TMT)的相对蛋白质丰度。我们的结论是,基于k近邻的插补方法能够成功插补缺失值高达50%的数据集中的缺失值。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验