LEON-BIS：使用贝叶斯推理系统对序列邻域进行多重比对评估。

LEON-BIS: multiple alignment evaluation of sequence neighbours using a Bayesian inference system.

作者信息

Vanhoutreve Renaud, Kress Arnaud, Legrand Baptiste, Gass Hélène, Poch Olivier, Thompson Julie D

机构信息

Department of Computer Science, ICube, UMR 7357, University of Strasbourg, CNRS, Fédération de médecine translationnelle de Strasbourg, Strasbourg, France.

出版信息

BMC Bioinformatics. 2016 Jul 7;17(1):271. doi: 10.1186/s12859-016-1146-y.

DOI:10.1186/s12859-016-1146-y

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4936259/

Abstract

BACKGROUND

A standard procedure in many areas of bioinformatics is to use a multiple sequence alignment (MSA) as the basis for various types of homology-based inference. Applications include 3D structure modelling, protein functional annotation, prediction of molecular interactions, etc. These applications, however sophisticated, are generally highly sensitive to the alignment used, and neglecting non-homologous or uncertain regions in the alignment can lead to significant bias in the subsequent inferences.

RESULTS

Here, we present a new method, LEON-BIS, which uses a robust Bayesian framework to estimate the homologous relations between sequences in a protein multiple alignment. Sequences are clustered into sub-families and relations are predicted at different levels, including 'core blocks', 'regions' and full-length proteins. The accuracy and reliability of the predictions are demonstrated in large-scale comparisons using well annotated alignment databases, where the homologous sequence segments are detected with very high sensitivity and specificity.

CONCLUSIONS

LEON-BIS uses robust Bayesian statistics to distinguish the portions of multiple sequence alignments that are conserved either across the whole family or within subfamilies. LEON-BIS should thus be useful for automatic, high-throughput genome annotations, 2D/3D structure predictions, protein-protein interaction predictions etc.

摘要

背景

在生物信息学的许多领域，一个标准程序是使用多序列比对（MSA）作为各种基于同源性推断的基础。应用包括三维结构建模、蛋白质功能注释、分子相互作用预测等。然而，这些应用无论多么复杂，通常都对所使用的比对高度敏感，并且忽略比对中的非同源或不确定区域可能会导致后续推断出现重大偏差。

结果

在此，我们提出一种新方法LEON - BIS，它使用稳健的贝叶斯框架来估计蛋白质多序列比对中序列之间的同源关系。序列被聚类成亚家族，并在不同层次上预测关系，包括“核心区域”、“区域”和全长蛋白质。在使用注释良好的比对数据库进行的大规模比较中证明了预测的准确性和可靠性，其中同源序列片段以非常高的灵敏度和特异性被检测到。

结论

LEON - BIS使用稳健的贝叶斯统计来区分在整个家族或亚家族内保守的多序列比对部分。因此，LEON - BIS对于自动、高通量的基因组注释、二维/三维结构预测、蛋白质 - 蛋白质相互作用预测等应该是有用的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/815e/4936259/1b2efca6e30e/12859_2016_1146_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

文档翻译

学术文献翻译模型，支持多种主流文档格式。

相似文献

1

LEON-BIS: multiple alignment evaluation of sequence neighbours using a Bayesian inference system.

BMC Bioinformatics. 2016 Jul 7;17(1):271. doi: 10.1186/s12859-016-1146-y.

2

LEON: multiple aLignment Evaluation Of Neighbours.

Nucleic Acids Res. 2004 Feb 24;32(4):1298-307. doi: 10.1093/nar/gkh294. Print 2004.

3

PROMALS web server for accurate multiple protein sequence alignments.

Nucleic Acids Res. 2007 Jul;35(Web Server issue):W649-52. doi: 10.1093/nar/gkm227. Epub 2007 Apr 22.

4

Analysis and prediction of functional sub-types from protein sequence alignments.

J Mol Biol. 2000 Oct 13;303(1):61-76. doi: 10.1006/jmbi.2000.4036.

5

A comprehensive benchmark study of multiple sequence alignment methods: current challenges and future perspectives.

PLoS One. 2011 Mar 31;6(3):e18093. doi: 10.1371/journal.pone.0018093.

6

Bayesian coestimation of phylogeny and sequence alignment.

BMC Bioinformatics. 2005 Apr 1;6:83. doi: 10.1186/1471-2105-6-83.

7

OXBench: a benchmark for evaluation of protein multiple sequence alignment accuracy.

BMC Bioinformatics. 2003 Oct 10;4:47. doi: 10.1186/1471-2105-4-47.

8

Protein multiple sequence alignment benchmarking through secondary structure prediction.

Bioinformatics. 2017 May 1;33(9):1331-1337. doi: 10.1093/bioinformatics/btw840.

9

MSACompro: improving multiple protein sequence alignment by predicted structural features.

Methods Mol Biol. 2014;1079:273-83. doi: 10.1007/978-1-62703-646-7_18.

10

Profile-profile comparisons by COMPASS predict intricate homologies between protein families.

Protein Sci. 2003 Oct;12(10):2262-72. doi: 10.1110/ps.03197403.

引用本文的文献

1

Unifying the known and unknown microbial coding sequence space.

Elife. 2022 Mar 31;11:e67667. doi: 10.7554/eLife.67667.

2

Understanding the causes of errors in eukaryotic protein-coding gene prediction: a case study of primate proteomes.

BMC Bioinformatics. 2020 Nov 10;21(1):513. doi: 10.1186/s12859-020-03855-1.

3

OrthoInspector 3.0: open portal for comparative genomics.

Nucleic Acids Res. 2019 Jan 8;47(D1):D411-D418. doi: 10.1093/nar/gky1068.

本文引用的文献

1

CASP11 statistics and the prediction center evaluation system.

Proteins. 2016 Sep;84 Suppl 1(Suppl 1):15-9. doi: 10.1002/prot.25005. Epub 2016 Mar 9.

2

The Pfam protein families database: towards a more sustainable future.

Nucleic Acids Res. 2016 Jan 4;44(D1):D279-85. doi: 10.1093/nar/gkv1344. Epub 2015 Dec 15.

3

OD-seq: outlier detection in multiple sequence alignments.

BMC Bioinformatics. 2015 Aug 25;16:269. doi: 10.1186/s12859-015-0702-1.

4

Current Methods for Automated Filtering of Multiple Sequence Alignments Frequently Worsen Single-Gene Phylogenetic Inference.

Syst Biol. 2015 Sep;64(5):778-91. doi: 10.1093/sysbio/syv033. Epub 2015 Jun 1.

5

GUIDANCE2: accurate detection of unreliable alignment regions accounting for the uncertainty of multiple parameters.

Nucleic Acids Res. 2015 Jul 1;43(W1):W7-14. doi: 10.1093/nar/gkv318. Epub 2015 Apr 16.

6

DivA: detection of non-homologous and very divergent regions in protein sequence alignments.

BMC Res Notes. 2014 Nov 18;7:806. doi: 10.1186/1756-0500-7-806.

7

SIBIS: a Bayesian model for inconsistent protein sequence estimation.

Bioinformatics. 2014 Sep 1;30(17):2432-9. doi: 10.1093/bioinformatics/btu329. Epub 2014 May 13.

8

TCS: a new multiple sequence alignment reliability measure to estimate alignment accuracy and improve phylogenetic tree reconstruction.

Mol Biol Evol. 2014 Jun;31(6):1625-37. doi: 10.1093/molbev/msu117. Epub 2014 Apr 1.

9

The lost intrinsic fragmentation of MAT1 protein during granulopoiesis promotes the growth and metastasis of leukemic myeloblasts.

Stem Cells. 2013 Sep;31(9):1942-53. doi: 10.1002/stem.1444.

10

SigniSite: Identification of residue-level genotype-phenotype correlations in protein multiple sequence alignments.

Nucleic Acids Res. 2013 Jul;41(Web Server issue):W286-91. doi: 10.1093/nar/gkt497. Epub 2013 Jun 12.