Suppr超能文献

paPAML:一个用于探索蛋白质编码序列选择压力的改进计算工具。

paPAML: An Improved Computational Tool to Explore Selection Pressure on Protein-Coding Sequences.

机构信息

Institute of Experimental Pathology, ZMBE, University of Münster, 48149 Münster, Germany.

Institute of Bioinformatics, Faculty of Medicine, University of Münster, 48149 Münster, Germany.

出版信息

Genes (Basel). 2022 Jun 18;13(6):1090. doi: 10.3390/genes13061090.

Abstract

Evolution is change over time. Although neutral changes promoted by drift effects are most reliable for phylogenetic reconstructions, selection-relevant changes are of only limited use to reconstruct phylogenies. On the other hand, comparative analyses of neutral and selected changes of protein-coding DNA sequences (CDS) retrospectively tell us about episodic constrained, relaxed, and adaptive incidences. The ratio of sites with nonsynonymous (amino acid altering) versus synonymous (not altering) mutations directly measures selection pressure and can be analysed by using the Phylogenetic Analysis by Maximum Likelihood (PAML) software package. We developed a CDS extractor for compiling protein-coding sequences (CDS-extractor) and parallel PAML (paPAML) to simplify, amplify, and accelerate selection analyses via parallel processing, including detection of negatively selected sites. paPAML compiles results of site, branch-site, and branch models and detects site-specific negative selection with the output of a codon list labelling significance values. The tool simplifies selection analyses for casual and inexperienced users and accelerates computing speeds up to the number of allocated computer threads. We then applied paPAML to examine the evolutionary impact on a new GINS Complex Subunit 3 exon, and neutrophil-associated as well as lysin and apolipoprotein genes. Compared with codeml (PAML version 4.9j) and HyPhy (HyPhy FEL version 2.5.26), all paPAML test runs performed with 10 computing threads led to identical selection pressure results, whereas the total selection analysis via paPAML, including all model comparisons, was about 3 to 5 times faster than the longest running codeml model and about 7 to 15 times faster than the entire processing time of these codeml runs.

摘要

进化是随时间而发生的变化。虽然由漂变效应推动的中性变化最有利于系统发育重建,但与选择相关的变化对于重建系统发育的作用有限。另一方面,对蛋白质编码 DNA 序列(CDS)的中性和选择变化的比较分析可以回溯性地告诉我们关于间歇性的约束、放松和适应性事件。非同义(改变氨基酸)与同义(不改变)突变的比值直接衡量选择压力,可以使用最大似然法(PAML)软件包进行分析。我们开发了一个 CDS 提取器(CDS-extractor)和并行 PAML(paPAML),用于通过并行处理简化、放大和加速选择分析,包括检测负选择位点。paPAML 编译了位点、分支位点和分支模型的结果,并通过标记显著值的密码子列表输出检测到特异性负选择位点。该工具简化了偶然和无经验用户的选择分析,并将计算速度提高到分配的计算机线程数量。然后,我们应用 paPAML 检查了新的 GINS 复合物亚基 3 外显子以及中性粒细胞相关、溶菌素和载脂蛋白基因的进化影响。与 codeml(PAML 版本 4.9j)和 HyPhy(HyPhy FEL 版本 2.5.26)相比,所有使用 10 个计算线程运行的 paPAML 测试都得出了相同的选择压力结果,而通过 paPAML 进行的总选择分析,包括所有模型比较,比 codeml 运行时间最长的模型快约 3 到 5 倍,比这些 codeml 运行的总处理时间快约 7 到 15 倍。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/84d3/9222883/b54bb2d4c068/genes-13-01090-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验