Suppr超能文献

LEON:邻居的多重比对评估

LEON: multiple aLignment Evaluation Of Neighbours.

作者信息

Thompson Julie D, Prigent Véronique, Poch Olivier

机构信息

Laboratoire de Biologie et Genomique Structurales, Institut de Génétique et de Biologie Moléculaire et Cellulaire, CNRS/INSERM/ULP, BP 163, 67404 Illkirch Cedex, France.

出版信息

Nucleic Acids Res. 2004 Feb 24;32(4):1298-307. doi: 10.1093/nar/gkh294. Print 2004.

Abstract

Sequence alignments are fundamental to a wide range of applications, including database searching, functional residue identification and structure prediction techniques. These applications predict or propagate structural/functional/evolutionary information based on a presumed homology between the aligned sequences. If the initial hypothesis of homology is wrong, no subsequent application, however sophisticated, can be expected to yield accurate results. Here we present a novel method, LEON, to predict homology between proteins based on a multiple alignment of complete sequences (MACS). In MACS, weak signals from distantly related proteins can be considered in the overall context of the family. Intermediate sequences and the combination of individual weak matches are used to increase the significance of low-scoring regions. Residue composition is also taken into account by incorporation of several existing methods for the detection of compositionally biased sequence segments. The accuracy and reliability of the predictions is demonstrated in large-scale comparisons with structural and sequence family databases, where the specificity was shown to be >99% and the sensitivity was estimated to be approximately 76%. LEON can thus be used to reliably identify the complex relationships between large multidomain proteins and should be useful for automatic high-throughput genome annotations, 2D/3D structure predictions, protein-protein interaction predictions etc.

摘要

序列比对对于广泛的应用至关重要,包括数据库搜索、功能残基识别和结构预测技术。这些应用基于比对序列之间假定的同源性来预测或传播结构/功能/进化信息。如果同源性的初始假设错误,那么无论后续应用多么复杂,都无法期望得到准确的结果。在此,我们提出一种新方法LEON,基于完整序列的多重比对(MACS)来预测蛋白质之间的同源性。在MACS中,可以在家族的整体背景下考虑来自远缘相关蛋白质的微弱信号。中间序列以及单个弱匹配的组合用于提高低得分区域的显著性。通过纳入几种现有的检测组成偏向性序列片段的方法,还考虑了残基组成。在与结构和序列家族数据库的大规模比较中证明了预测的准确性和可靠性,其中特异性显示大于99%,敏感性估计约为76%。因此,LEON可用于可靠地识别大型多结构域蛋白质之间的复杂关系,并且应该对自动高通量基因组注释、二维/三维结构预测、蛋白质-蛋白质相互作用预测等有用。

相似文献

1
LEON: multiple aLignment Evaluation Of Neighbours.
Nucleic Acids Res. 2004 Feb 24;32(4):1298-307. doi: 10.1093/nar/gkh294. Print 2004.
2
LEON-BIS: multiple alignment evaluation of sequence neighbours using a Bayesian inference system.
BMC Bioinformatics. 2016 Jul 7;17(1):271. doi: 10.1186/s12859-016-1146-y.
3
Towards a reliable objective function for multiple sequence alignments.
J Mol Biol. 2001 Dec 7;314(4):937-51. doi: 10.1006/jmbi.2001.5187.
4
DECIPHER: harnessing local sequence context to improve protein multiple sequence alignment.
BMC Bioinformatics. 2015 Oct 6;16:322. doi: 10.1186/s12859-015-0749-z.
5
DbW: automatic update of a functional family-specific multiple alignment.
Bioinformatics. 2005 Apr 15;21(8):1437-42. doi: 10.1093/bioinformatics/bti218. Epub 2004 Dec 14.
6
Large-scale comparison of protein sequence alignment algorithms with structure alignments.
Proteins. 2000 Jul 1;40(1):6-22. doi: 10.1002/(sici)1097-0134(20000701)40:1<6::aid-prot30>3.0.co;2-7.
7
SAD--a normalized structural alignment database: improving sequence-structure alignments.
Bioinformatics. 2004 Oct 12;20(15):2333-44. doi: 10.1093/bioinformatics/bth244. Epub 2004 Apr 15.
8
Robust sequence alignment using evolutionary rates coupled with an amino acid substitution matrix.
BMC Bioinformatics. 2015 Aug 14;16:255. doi: 10.1186/s12859-015-0688-8.
9
How reliably can we predict the reliability of protein structure predictions?
BMC Bioinformatics. 2008 Mar 3;9:137. doi: 10.1186/1471-2105-9-137.
10
PROMALS3D: a tool for multiple protein sequence and structure alignments.
Nucleic Acids Res. 2008 Apr;36(7):2295-300. doi: 10.1093/nar/gkn072. Epub 2008 Feb 20.

引用本文的文献

1
LEON-BIS: multiple alignment evaluation of sequence neighbours using a Bayesian inference system.
BMC Bioinformatics. 2016 Jul 7;17(1):271. doi: 10.1186/s12859-016-1146-y.
2
Accuracy estimation and parameter advising for protein multiple sequence alignment.
J Comput Biol. 2013 Apr;20(4):259-79. doi: 10.1089/cmb.2013.0007. Epub 2013 Mar 14.
3
The chordate proteome history database.
Evol Bioinform Online. 2012;8:437-47. doi: 10.4137/EBO.S9186. Epub 2012 Aug 1.
4
MSV3d: database of human MisSense Variants mapped to 3D protein structure.
Database (Oxford). 2012 Apr 3;2012:bas018. doi: 10.1093/database/bas018. Print 2012.
5
EvoluCode: Evolutionary Barcodes as a Unifying Framework for Multilevel Evolutionary Data.
Evol Bioinform Online. 2012;8:61-77. doi: 10.4137/EBO.S8814. Epub 2011 Dec 21.
8
A new protein linear motif benchmark for multiple sequence alignment software.
BMC Bioinformatics. 2008 Apr 25;9:213. doi: 10.1186/1471-2105-9-213.
10
GreenPhylDB: a database for plant comparative genomics.
Nucleic Acids Res. 2008 Jan;36(Database issue):D991-8. doi: 10.1093/nar/gkm934. Epub 2007 Nov 5.

本文引用的文献

1
Predicting reliable regions in protein alignments from sequence profiles.
J Mol Biol. 2003 Jul 18;330(4):705-18. doi: 10.1016/s0022-2836(03)00622-3.
2
PipeAlign: A new toolkit for protein family analysis.
Nucleic Acids Res. 2003 Jul 1;31(13):3829-32. doi: 10.1093/nar/gkg518.
3
RASCAL: rapid scanning and correction of multiple sequence alignments.
Bioinformatics. 2003 Jun 12;19(9):1155-61. doi: 10.1093/bioinformatics/btg133.
4
Detection of unrelated proteins in sequences multiple alignments by using predicted secondary structures.
Bioinformatics. 2003 Mar 1;19(4):506-12. doi: 10.1093/bioinformatics/btg016.
5
SEGID: identifying interesting segments in (multiple) sequence alignments.
Bioinformatics. 2003 Jan 22;19(2):297-8. doi: 10.1093/bioinformatics/19.2.297.
6
MMDB: Entrez's 3D-structure database.
Nucleic Acids Res. 2003 Jan 1;31(1):474-7. doi: 10.1093/nar/gkg086.
7
The CATH database: an extended protein family resource for structural and functional genomics.
Nucleic Acids Res. 2003 Jan 1;31(1):452-5. doi: 10.1093/nar/gkg062.
8
The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003.
Nucleic Acids Res. 2003 Jan 1;31(1):365-70. doi: 10.1093/nar/gkg095.
9
The InterPro Database, 2003 brings increased coverage and new features.
Nucleic Acids Res. 2003 Jan 1;31(1):315-8. doi: 10.1093/nar/gkg046.
10
Sequence variations within protein families are linearly related to structural variations.
J Mol Biol. 2002 Oct 25;323(3):551-62. doi: 10.1016/s0022-2836(02)00971-3.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验