Chafra Fatma, Borim Correa Felipe, Oni Faith, Konu Karakayalı Özlen, Stadler Peter F, Nunes da Rocha Ulisses
Department of Environmental Microbiology, Helmholtz Centre for Environmental Research-UFZ, Leipzig 04318, Germany.
Department of Molecular Biology and Genetics, Bilkent University, Ankara 06800, Turkey.
Bioinform Adv. 2023 Jun 9;3(1):vbad069. doi: 10.1093/bioadv/vbad069. eCollection 2023.
Several genome annotation tools standardize annotation outputs for comparability. During standardization, these tools do not allow user-friendly customization of annotation databases; limiting their flexibility and applicability in downstream analysis.
StandEnA is a user-friendly command-line tool for Linux that facilitates the generation of custom databases by retrieving protein sequences from multiple databases. Directed by a user-defined list of standard names, StandEnA retrieves synonyms to search for corresponding sequences in a set of public databases. Custom databases are used in prokaryotic genome annotation to generate standardized presence-absence matrices and reference files containing standard database identifiers. To showcase StandEnA, we applied it to six metagenome-assembled genomes to analyze three different pathways.
StandEnA is an open-source software available at https://github.com/mdsufz/StandEnA.
Supplementary data are available at online.
几种基因组注释工具对注释输出进行标准化以实现可比性。在标准化过程中,这些工具不允许用户对注释数据库进行友好的定制;限制了它们在下游分析中的灵活性和适用性。
StandEnA是一个适用于Linux的用户友好型命令行工具,它通过从多个数据库中检索蛋白质序列来促进定制数据库的生成。在用户定义的标准名称列表的指导下,StandEnA检索同义词以在一组公共数据库中搜索相应的序列。定制数据库用于原核生物基因组注释,以生成标准化的存在-缺失矩阵和包含标准数据库标识符的参考文件。为了展示StandEnA,我们将其应用于六个宏基因组组装基因组,以分析三种不同的途径。
StandEnA是一款开源软件,可在https://github.com/mdsufz/StandEnA获取。
补充数据可在网上获取。