Suppr超能文献

在GPU上加速Minimap2以实现准确的长读长比对

Accelerating Minimap2 for Accurate Long Read Alignment on GPUs.

作者信息

Sadasivan Harisankar, Maric Milos, Dawson Eric, Iyer Vishanth, Israeli Johnny, Narayanasamy Satish

机构信息

Department of Computer Science and Engineering, University of Michigan Ann Arbor, MI 48109, USA.

NVIDIA Corporation, Santa Clara, CA 95051, USA.

出版信息

J Biotechnol Biomed. 2023;6(1):13-23. doi: 10.26502/jbb.2642-91280067. Epub 2023 Jan 20.

Abstract

Long read sequencing technology is becoming increasingly popular for Precision Medicine applications like Whole Genome Sequencing (WGS) and microbial abundance estimation. Minimap2 is the state-of-the-art aligner and mapper used by the leading long read sequencing technologies, today. However, Minimap2 on CPUs is very slow for long noisy reads. ~60-70% of the run-time on a CPU comes from the highly sequential chaining step in Minimap2. On the other hand, most Point-of-Care computational workflows in long read sequencing use Graphics Processing Units (GPUs). We present minimap2-accelerated (mm2-ax), a heterogeneous design for sequence mapping and alignment where minimap2's compute intensive chaining step is sped up on the GPU and demonstrate its time and cost benefits. We extract better intra-read parallelism from chaining without losing mapping accuracy by forward transforming Minimap2's chaining algorithm. Moreover, we better utilize the high memory available on modern cloud instances apart from better workload balancing, data locality and minimal branch divergence on the GPU. We show mm2-ax on an NVIDIA A100 GPU improves the chaining step with 5.41 - 2.57X speedup and 4.07 - 1.93X speedup : costup over the fastest version of Minimap2, mm2-fast, benchmarked on a Google Cloud Platform instance of 30 SIMD cores.

摘要

长读长测序技术在全基因组测序(WGS)和微生物丰度估计等精准医学应用中越来越受欢迎。Minimap2是目前领先的长读长测序技术所使用的最先进的比对器和映射器。然而,在CPU上运行时,Minimap2处理长且有噪声的读段速度非常慢。在CPU上,约60%-70%的运行时间来自Minimap2中高度顺序化的连锁步骤。另一方面,长读长测序中的大多数即时医疗计算工作流程都使用图形处理单元(GPU)。我们提出了minimap2加速版(mm2-ax),这是一种用于序列映射和比对的异构设计,其中Minimap2计算密集型的连锁步骤在GPU上得到加速,并展示了其时间和成本效益。通过对Minimap2的连锁算法进行正向变换,我们在连锁过程中提取了更好的读段内并行性,同时不损失映射准确性。此外,除了更好的工作负载平衡、数据局部性和GPU上最小的分支发散外,我们还更好地利用了现代云实例上可用的高内存。我们展示了在NVIDIA A100 GPU上的mm2-ax,与在具有30个SIMD核心的谷歌云平台实例上进行基准测试的Minimap2最快版本mm2-fast相比,连锁步骤的加速比为5.41 - 2.57倍,成本加速比为4.07 - 1.93倍。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e6cc/10018915/9251e560729a/nihms-1874402-f0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验