Suppr超能文献

SAMStat:监测下一代测序数据中的偏倚。

SAMStat: monitoring biases in next generation sequencing data.

机构信息

Omics Science Center, Riken Yokohama Institute, Tsurumi-ku, Yokohama, Japan.

出版信息

Bioinformatics. 2011 Jan 1;27(1):130-1. doi: 10.1093/bioinformatics/btq614. Epub 2010 Nov 18.

Abstract

MOTIVATION

The sequence alignment/map format (SAM) is a commonly used format to store the alignments between millions of short reads and a reference genome. Often certain positions within the reads are inherently more likely to contain errors due to the protocols used to prepare the samples. Such biases can have adverse effects on both mapping rate and accuracy. To understand the relationship between potential protocol biases and poor mapping we wrote SAMstat, a simple C program plotting nucleotide overrepresentation and other statistics in mapped and unmapped reads in a concise html page. Collecting such statistics also makes it easy to highlight problems in the data processing and enables non-experts to track data quality over time.

RESULTS

We demonstrate that studying sequence features in mapped data can be used to identify biases particular to one sequencing protocol. Once identified, such biases can be considered in the downstream analysis or even be removed by read trimming or filtering techniques.

AVAILABILITY

SAMStat is open source and freely available as a C program running on all Unix-compatible platforms. The source code is available from http://samstat.sourceforge.net.

CONTACT

timolassmann@gmail.com.

摘要

动机

序列比对/映射格式(SAM)是一种常用的格式,用于存储数百万个短读段与参考基因组之间的比对。由于用于准备样本的协议,读段内的某些位置通常更容易出现固有错误。这种偏差会对映射率和准确性产生不利影响。为了了解潜在协议偏差与映射不良之间的关系,我们编写了 SAMstat,这是一个简单的 C 程序,用于在映射和未映射的读段中以简洁的 HTML 页面绘制核苷酸过表达和其他统计信息。收集这些统计信息还可以方便地突出数据处理中的问题,并使非专家能够随时间跟踪数据质量。

结果

我们证明,研究映射数据中的序列特征可用于识别特定于一种测序协议的偏差。一旦确定,这些偏差可以在下游分析中考虑,甚至可以通过读段修剪或过滤技术来去除。

可用性

SAMStat 是开源的,可作为在所有与 Unix 兼容的平台上运行的 C 程序免费获得。源代码可从 http://samstat.sourceforge.net 获得。

联系方式

timolassmann@gmail.com

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd2c/3008642/03cc4a771c17/btq614f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验