Khleborodova Asya, Gamboa-Tuz Samuel D, Ramos Marcel, Segata Nicola, Waldron Levi, Oh Sehyun
Institute for Implementation Science in Population Health, City University of New York School of Public Health, New York, NY 10027, United States.
Department of Epidemiology and Biostatistics, City University of New York School of Public Health, New York, NY 10027, United States.
Bioinformatics. 2024 Nov 28;40(12). doi: 10.1093/bioinformatics/btae707.
LEfSe is a widely used Python package and Galaxy module for metagenomic biomarker discovery and visualization, utilizing the Kruskal-Wallis test, Wilcoxon Rank-Sum test, and Linear Discriminant Analysis. R/Bioconductor provides a large collection of tools for metagenomic data analysis but has lacked an implementation of this widely used algorithm, hindering benchmarking against other tools and incorporation into R workflows. We present the lefser package to provide comparable functionality within the R/Bioconductor ecosystem of statistical analysis tools, with improvements to the original algorithm for performance, accuracy, and reproducibility. We benchmark the performance of lefser against the original algorithm using human and mouse metagenomic datasets.
Our software, lefser, is distributed through the Bioconductor project (https://www.bioconductor.org/packages/release/bioc/html/lefser.html), and all the source code is available in the GitHub repository https://github.com/waldronlab/lefser.
LEfSe是一个广泛使用的Python包和Galaxy模块,用于宏基因组生物标志物的发现和可视化,它利用了Kruskal-Wallis检验、Wilcoxon秩和检验以及线性判别分析。R/Bioconductor提供了大量用于宏基因组数据分析的工具,但缺少这种广泛使用算法的实现,这阻碍了与其他工具的基准测试以及将其纳入R工作流程。我们提出了lefser包,以便在R/Bioconductor统计分析工具生态系统中提供可比的功能,并对原始算法在性能、准确性和可重复性方面进行了改进。我们使用人类和小鼠宏基因组数据集,将lefser的性能与原始算法进行了基准测试。
我们的软件lefser通过Bioconductor项目(https://www.bioconductor.org/packages/release/bioc/html/lefser.html)进行分发,所有源代码可在GitHub存储库https://github.com/waldronlab/lefser中获取。