Department of Computer Science, University of Kentucky, Lexington, KY 40506, USA.
Nucleic Acids Res. 2010 Oct;38(18):e178. doi: 10.1093/nar/gkq622. Epub 2010 Aug 27.
The accurate mapping of reads that span splice junctions is a critical component of all analytic techniques that work with RNA-seq data. We introduce a second generation splice detection algorithm, MapSplice, whose focus is high sensitivity and specificity in the detection of splices as well as CPU and memory efficiency. MapSplice can be applied to both short (<75 bp) and long reads (≥ 75 bp). MapSplice is not dependent on splice site features or intron length, consequently it can detect novel canonical as well as non-canonical splices. MapSplice leverages the quality and diversity of read alignments of a given splice to increase accuracy. We demonstrate that MapSplice achieves higher sensitivity and specificity than TopHat and SpliceMap on a set of simulated RNA-seq data. Experimental studies also support the accuracy of the algorithm. Splice junctions derived from eight breast cancer RNA-seq datasets recapitulated the extensiveness of alternative splicing on a global level as well as the differences between molecular subtypes of breast cancer. These combined results indicate that MapSplice is a highly accurate algorithm for the alignment of RNA-seq reads to splice junctions. Software download URL: http://www.netlab.uky.edu/p/bioinfo/MapSplice.
准确地映射跨越剪接接头的读取是所有使用 RNA-seq 数据的分析技术的关键组成部分。我们引入了第二代剪接检测算法 MapSplice,其重点是在检测剪接方面具有高灵敏度和特异性,以及 CPU 和内存效率。MapSplice 可应用于短读(<75bp)和长读(≥75bp)。MapSplice 不依赖剪接位点特征或内含子长度,因此可以检测新的规范和非规范剪接。MapSplice 利用给定剪接的读取对齐的质量和多样性来提高准确性。我们证明,在一组模拟的 RNA-seq 数据上,MapSplice 比 TopHat 和 SpliceMap 具有更高的灵敏度和特异性。实验研究也支持该算法的准确性。从八个乳腺癌 RNA-seq 数据集得出的剪接接头在全局水平上再现了可变剪接的广泛性,以及乳腺癌分子亚型之间的差异。这些综合结果表明,MapSplice 是一种高度准确的算法,用于将 RNA-seq 读取与剪接接头对齐。软件下载网址:http://www.netlab.uky.edu/p/bioinfo/MapSplice。