Chen Shifu, Huang Tanxiao, Zhou Yanqing, Han Yue, Xu Mingyan, Gu Jia
Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Xueyuan Road, Shenzhen, China.
HaploX BioTechnology, Songpingshan Road, Shenzhen, China.
BMC Bioinformatics. 2017 Mar 14;18(Suppl 3):80. doi: 10.1186/s12859-017-1469-3.
Some applications, especially those clinical applications requiring high accuracy of sequencing data, usually have to face the troubles caused by unavoidable sequencing errors. Several tools have been proposed to profile the sequencing quality, but few of them can quantify or correct the sequencing errors. This unmet requirement motivated us to develop AfterQC, a tool with functions to profile sequencing errors and correct most of them, plus highly automated quality control and data filtering features. Different from most tools, AfterQC analyses the overlapping of paired sequences for pair-end sequencing data. Based on overlapping analysis, AfterQC can detect and cut adapters, and furthermore it gives a novel function to correct wrong bases in the overlapping regions. Another new feature is to detect and visualise sequencing bubbles, which can be commonly found on the flowcell lanes and may raise sequencing errors. Besides normal per cycle quality and base content plotting, AfterQC also provides features like polyX (a long sub-sequence of a same base X) filtering, automatic trimming and K-MER based strand bias profiling.
For each single or pair of FastQ files, AfterQC filters out bad reads, detects and eliminates sequencer's bubble effects, trims reads at front and tail, detects the sequencing errors and corrects part of them, and finally outputs clean data and generates HTML reports with interactive figures. AfterQC can run in batch mode with multiprocess support, it can run with a single FastQ file, a single pair of FastQ files (for pair-end sequencing), or a folder for all included FastQ files to be processed automatically. Based on overlapping analysis, AfterQC can estimate the sequencing error rate and profile the error transform distribution. The results of our error profiling tests show that the error distribution is highly platform dependent.
Much more than just another new quality control (QC) tool, AfterQC is able to perform quality control, data filtering, error profiling and base correction automatically. Experimental results show that AfterQC can help to eliminate the sequencing errors for pair-end sequencing data to provide much cleaner outputs, and consequently help to reduce the false-positive variants, especially for the low-frequency somatic mutations. While providing rich configurable options, AfterQC can detect and set all the options automatically and require no argument in most cases.
一些应用,尤其是那些对测序数据准确性要求较高的临床应用,通常不得不面对不可避免的测序错误所带来的麻烦。已经提出了几种工具来分析测序质量,但其中很少有能够量化或校正测序错误的。这种未得到满足的需求促使我们开发了AfterQC,这是一种具有分析测序错误并校正其中大部分错误功能的工具,此外还具有高度自动化的质量控制和数据过滤功能。与大多数工具不同,AfterQC分析双端测序数据中配对序列的重叠情况。基于重叠分析,AfterQC可以检测并切除接头,此外,它还具有校正重叠区域中错误碱基的新功能。另一个新功能是检测并可视化测序气泡,这些气泡在流动池泳道上很常见,可能会引发测序错误。除了常规的每个循环质量和碱基含量绘图外,AfterQC还提供诸如多聚X(相同碱基X的长子序列)过滤、自动修剪和基于K-mer的链偏倚分析等功能。
对于每个单个或一对FastQ文件,AfterQC会过滤掉质量差的 reads,检测并消除测序仪的气泡效应,修剪 reads 的头部和尾部,检测测序错误并校正其中一部分,最后输出干净的数据并生成带有交互式图形的HTML报告。AfterQC可以在多进程支持下以批处理模式运行,它可以处理单个FastQ文件、单个双端测序的FastQ文件对,或者一个包含所有待自动处理的FastQ文件的文件夹。基于重叠分析,AfterQC可以估计测序错误率并分析错误转换分布。我们的错误分析测试结果表明,错误分布高度依赖于平台。
AfterQC远不止是另一种新的质量控制(QC)工具,它能够自动执行质量控制、数据过滤、错误分析和碱基校正。实验结果表明,AfterQC有助于消除双端测序数据中的测序错误,以提供更干净的输出,从而有助于减少假阳性变异,特别是对于低频体细胞突变。在提供丰富的可配置选项的同时,AfterQC可以自动检测并设置所有选项,并且在大多数情况下无需参数。