Cameron Michael, Williams Hugh E, Cannane Adam
School of Computer Science and Information Technology, RMIT University, Melbourne, Australia.
J Comput Biol. 2006 May;13(4):965-78. doi: 10.1089/cmb.2006.13.965.
BLAST is the most popular bioinformatics tool and is used to run millions of queries each day. However, evaluating such queries is slow, taking typically minutes on modern workstations. Therefore, continuing evolution of BLAST--by improving its algorithms and optimizations--is essential to improve search times in the face of exponentially increasing collection sizes. We present an optimization to the first stage of the BLAST algorithm specifically designed for protein search. It produces the same results as NCBI-BLAST but in around 59% of the time on Intel-based platforms; we also present results for other popular architectures. Overall, this is a saving of around 15% of the total typical BLAST search time. Our approach uses a deterministic finite automaton (DFA), inspired by the original scheme used in the 1990 BLAST algorithm. The techniques are optimized for modern hardware, making careful use of cache-conscious approaches to improve speed. Our optimized DFA approach has been integrated into a new version of BLAST that is freely available for download at http://www.fsa-blast.org/.
BLAST是最受欢迎的生物信息学工具,每天用于运行数百万次查询。然而,评估此类查询的速度很慢,在现代工作站上通常需要几分钟。因此,面对数据量呈指数级增长的情况,通过改进算法和优化来持续发展BLAST对于缩短搜索时间至关重要。我们针对专门用于蛋白质搜索的BLAST算法的第一阶段提出了一种优化方法。它在基于英特尔的平台上产生与NCBI-BLAST相同的结果,但所需时间约为NCBI-BLAST的59%;我们还展示了其他流行架构的结果。总体而言,这节省了约15%的典型BLAST总搜索时间。我们的方法使用了确定性有限自动机(DFA),其灵感来自1990年BLAST算法中使用的原始方案。这些技术针对现代硬件进行了优化,谨慎地采用了缓存感知方法来提高速度。我们优化后的DFA方法已集成到新版本的BLAST中,可在http://www.fsa-blast.org/免费下载。