Anders Simon, Pyl Paul Theodor, Huber Wolfgang
Genome Biology Unit, European Molecular Biology Laboratory, 69111 Heidelberg, Germany.
Bioinformatics. 2015 Jan 15;31(2):166-9. doi: 10.1093/bioinformatics/btu638. Epub 2014 Sep 25.
A large choice of tools exists for many standard tasks in the analysis of high-throughput sequencing (HTS) data. However, once a project deviates from standard workflows, custom scripts are needed.
We present HTSeq, a Python library to facilitate the rapid development of such scripts. HTSeq offers parsers for many common data formats in HTS projects, as well as classes to represent data, such as genomic coordinates, sequences, sequencing reads, alignments, gene model information and variant calls, and provides data structures that allow for querying via genomic coordinates. We also present htseq-count, a tool developed with HTSeq that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes.
HTSeq is released as an open-source software under the GNU General Public Licence and available from http://www-huber.embl.de/HTSeq or from the Python Package Index at https://pypi.python.org/pypi/HTSeq.
在高通量测序(HTS)数据分析的许多标准任务中,存在大量可供选择的工具。然而,一旦项目偏离标准工作流程,就需要定制脚本。
我们展示了HTSeq,这是一个用于促进此类脚本快速开发的Python库。HTSeq提供了用于HTS项目中许多常见数据格式的解析器,以及用于表示数据的类,如基因组坐标、序列、测序读数、比对、基因模型信息和变异调用,并提供了允许通过基因组坐标进行查询的数据结构。我们还展示了htseq-count,这是一个使用HTSeq开发的工具,通过计算读数与基因的重叠来预处理RNA-Seq数据以进行差异表达分析。
HTSeq作为开源软件根据GNU通用公共许可证发布,可从http://www-huber.embl.de/HTSeq或Python软件包索引https://pypi.python.org/pypi/HTSeq获取。