INRIA/IRISA, Campus de Beaulieu, 35042 Rennes Cedex, France.
BMC Bioinformatics. 2009 Oct 12;10:329. doi: 10.1186/1471-2105-10-329.
Sequence similarity searching is an important and challenging task in molecular biology and next-generation sequencing should further strengthen the need for faster algorithms to process such vast amounts of data. At the same time, the internal architecture of current microprocessors is tending towards more parallelism, leading to the use of chips with two, four and more cores integrated on the same die. The main purpose of this work was to design an effective algorithm to fit with the parallel capabilities of modern microprocessors.
A parallel algorithm for comparing large genomic banks and targeting middle-range computers has been developed and implemented in PLAST software. The algorithm exploits two key parallel features of existing and future microprocessors: the SIMD programming model (SSE instruction set) and the multithreading concept (multicore). Compared to multithreaded BLAST software, tests performed on an 8-processor server have shown speedup ranging from 3 to 6 with a similar level of accuracy.
A parallel algorithmic approach driven by the knowledge of the internal microprocessor architecture allows significant speedup to be obtained while preserving standard sensitivity for similarity search problems.
序列相似性搜索是分子生物学中的一项重要且具有挑战性的任务,而新一代测序技术应该进一步加强对更快算法的需求,以处理如此庞大的数据量。与此同时,当前微处理器的内部架构正趋向于更高的并行性,从而导致使用具有集成在同一芯片上的两个、四个甚至更多内核的芯片。这项工作的主要目的是设计一种有效的算法,以适应现代微处理器的并行能力。
开发并实现了一种针对大型基因组库并针对中端计算机的并行算法,该算法利用了现有和未来微处理器的两个关键并行特性:SIMD 编程模型(SSE 指令集)和多线程概念(多核)。与多线程 BLAST 软件相比,在一个具有 8 个处理器的服务器上进行的测试表明,速度提高了 3 到 6 倍,而相似性搜索问题的准确性保持不变。
由对内部微处理器架构的了解驱动的并行算法方法允许获得显著的速度提升,同时保持相似性搜索问题的标准灵敏度。