StandEnA：一种用于标准化注释和生成蛋白质存在-缺失矩阵的可定制工作流程。

StandEnA: a customizable workflow for standardized annotation and generating a presence-absence matrix of proteins.

作者信息

Chafra Fatma, Borim Correa Felipe, Oni Faith, Konu Karakayalı Özlen, Stadler Peter F, Nunes da Rocha Ulisses

机构信息

Department of Environmental Microbiology, Helmholtz Centre for Environmental Research-UFZ, Leipzig 04318, Germany.

Department of Molecular Biology and Genetics, Bilkent University, Ankara 06800, Turkey.

出版信息

Bioinform Adv. 2023 Jun 9;3(1):vbad069. doi: 10.1093/bioadv/vbad069. eCollection 2023.

DOI:10.1093/bioadv/vbad069

PMID:37448812

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10336186/

Abstract

MOTIVATION

Several genome annotation tools standardize annotation outputs for comparability. During standardization, these tools do not allow user-friendly customization of annotation databases; limiting their flexibility and applicability in downstream analysis.

RESULTS

StandEnA is a user-friendly command-line tool for Linux that facilitates the generation of custom databases by retrieving protein sequences from multiple databases. Directed by a user-defined list of standard names, StandEnA retrieves synonyms to search for corresponding sequences in a set of public databases. Custom databases are used in prokaryotic genome annotation to generate standardized presence-absence matrices and reference files containing standard database identifiers. To showcase StandEnA, we applied it to six metagenome-assembled genomes to analyze three different pathways.

AVAILABILITY AND IMPLEMENTATION

StandEnA is an open-source software available at https://github.com/mdsufz/StandEnA.

SUPPLEMENTARY INFORMATION

Supplementary data are available at online.

摘要

动机

几种基因组注释工具对注释输出进行标准化以实现可比性。在标准化过程中，这些工具不允许用户对注释数据库进行友好的定制；限制了它们在下游分析中的灵活性和适用性。

结果

StandEnA是一个适用于Linux的用户友好型命令行工具，它通过从多个数据库中检索蛋白质序列来促进定制数据库的生成。在用户定义的标准名称列表的指导下，StandEnA检索同义词以在一组公共数据库中搜索相应的序列。定制数据库用于原核生物基因组注释，以生成标准化的存在-缺失矩阵和包含标准数据库标识符的参考文件。为了展示StandEnA，我们将其应用于六个宏基因组组装基因组，以分析三种不同的途径。

可用性和实现方式

StandEnA是一款开源软件，可在https://github.com/mdsufz/StandEnA获取。

补充信息

补充数据可在网上获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e46/10336186/6e4f3e28a5b9/vbad069f1.jpg

相似文献

StandEnA: a customizable workflow for standardized annotation and generating a presence-absence matrix of proteins.

Bioinform Adv. 2023 Jun 9;3(1):vbad069. doi: 10.1093/bioadv/vbad069. eCollection 2023.

Bakta: rapid and standardized annotation of bacterial genomes via alignment-free sequence identification.

Microb Genom. 2021 Nov;7(11). doi: 10.1099/mgen.0.000685.

ATLAS: a Snakemake workflow for assembly, annotation, and genomic binning of metagenome sequence data.

BMC Bioinformatics. 2020 Jun 22;21(1):257. doi: 10.1186/s12859-020-03585-4.

DFAST: a flexible prokaryotic genome annotation pipeline for faster genome publication.

Bioinformatics. 2018 Mar 15;34(6):1037-1039. doi: 10.1093/bioinformatics/btx713.

Variant Library Annotation Tool (VaLiAnT): an oligonucleotide library design and annotation tool for saturation genome editing and other deep mutational scanning experiments.

Bioinformatics. 2022 Jan 27;38(4):892-899. doi: 10.1093/bioinformatics/btab776.

TIMSCONVERT: a workflow to convert trapped ion mobility data to open data formats.

Bioinformatics. 2022 Aug 10;38(16):4046-4047. doi: 10.1093/bioinformatics/btac419.

GToTree: a user-friendly workflow for phylogenomics.

Bioinformatics. 2019 Oct 15;35(20):4162-4164. doi: 10.1093/bioinformatics/btz188.

multiPhATE: bioinformatics pipeline for functional annotation of phage isolates.

Bioinformatics. 2019 Nov 1;35(21):4402-4404. doi: 10.1093/bioinformatics/btz258.

Gfastats: conversion, evaluation and manipulation of genome sequences using assembly graphs.

Bioinformatics. 2022 Sep 2;38(17):4214-4216. doi: 10.1093/bioinformatics/btac460.

AGILE: an assembled genome mining pipeline.

Bioinformatics. 2019 Apr 1;35(7):1252-1254. doi: 10.1093/bioinformatics/bty781.

本文引用的文献

MuDoGeR: Multi-Domain Genome recovery from metagenomes made easy.

Mol Ecol Resour. 2024 Feb;24(2):e13904. doi: 10.1111/1755-0998.13904. Epub 2023 Nov 23.

Considerations for constructing a protein sequence database for metaproteomics.

Comput Struct Biotechnol J. 2022 Jan 21;20:937-952. doi: 10.1016/j.csbj.2022.01.018. eCollection 2022.

Structure and functional capacity of a benzene-mineralizing, nitrate-reducing microbial community.

J Appl Microbiol. 2022 Apr;132(4):2795-2811. doi: 10.1111/jam.15443. Epub 2022 Jan 17.

Database resources of the national center for biotechnology information.

Nucleic Acids Res. 2022 Jan 7;50(D1):D20-D26. doi: 10.1093/nar/gkab1112.

Bakta: rapid and standardized annotation of bacterial genomes via alignment-free sequence identification.

Microb Genom. 2021 Nov;7(11). doi: 10.1099/mgen.0.000685.

OrtSuite: from genomes to prediction of microbial interactions within targeted ecosystem processes.

Life Sci Alliance. 2021 Sep 27;4(12). doi: 10.26508/lsa.202101167. Print 2021 Dec.

MicrobeAnnotator: a user-friendly, comprehensive functional annotation pipeline for microbial genomes.

BMC Bioinformatics. 2021 Jan 6;22(1):11. doi: 10.1186/s12859-020-03940-5.

UniProt: the universal protein knowledgebase in 2021.

Nucleic Acids Res. 2021 Jan 8;49(D1):D480-D489. doi: 10.1093/nar/gkaa1100.

Pfam: The protein families database in 2021.

Nucleic Acids Res. 2021 Jan 8;49(D1):D412-D419. doi: 10.1093/nar/gkaa913.

DFAST: a flexible prokaryotic genome annotation pipeline for faster genome publication.

Bioinformatics. 2018 Mar 15;34(6):1037-1039. doi: 10.1093/bioinformatics/btx713.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

StandEnA：一种用于标准化注释和生成蛋白质存在-缺失矩阵的可定制工作流程。

StandEnA: a customizable workflow for standardized annotation and generating a presence-absence matrix of proteins.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现方式

补充信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献