Uhl Michael, Houwaart Torsten, Corrado Gianluca, Wright Patrick R, Backofen Rolf
Bioinformatics Group, Department of Computer Science, University of Freiburg, Germany.
Department of Information Engineering and Computer Science, University of Trento, Italy.
Methods. 2017 Apr 15;118-119:60-72. doi: 10.1016/j.ymeth.2017.02.006. Epub 2017 Feb 22.
CLIP-seq experiments are currently the most important means for determining the binding sites of RNA binding proteins on a genome-wide level. The computational analysis can be divided into three steps. In the first pre-processing stage, raw reads have to be trimmed and mapped to the genome. This step has to be specifically adapted for each CLIP-seq protocol. The next step is peak calling, which is required to remove unspecific signals and to determine bona fide protein binding sites on target RNAs. Here, both protocol-specific approaches as well as generic peak callers are available. Despite some peak callers being more widely used, each peak caller has its specific assets and drawbacks, and it might be advantageous to compare the results of several methods. Although peak calling is often the final step in many CLIP-seq publications, an important follow-up task is the determination of binding models from CLIP-seq data. This is central because CLIP-seq experiments are highly dependent on the transcriptional state of the cell in which the experiment was performed. Thus, relying solely on binding sites determined by CLIP-seq from different cells or conditions can lead to a high false negative rate. This shortcoming can, however, be circumvented by applying models that predict additional putative binding sites.
CLIP-seq实验是目前在全基因组水平上确定RNA结合蛋白结合位点的最重要手段。计算分析可分为三个步骤。在第一个预处理阶段,原始读段必须进行修剪并映射到基因组上。这一步必须针对每个CLIP-seq方案进行专门调整。下一步是峰检测,这需要去除非特异性信号并确定目标RNA上真正的蛋白质结合位点。在这里,既有针对特定方案的方法,也有通用的峰检测工具。尽管有些峰检测工具使用得更为广泛,但每个峰检测工具都有其特定的优点和缺点,比较几种方法的结果可能会有好处。虽然峰检测通常是许多CLIP-seq出版物中的最后一步,但一项重要的后续任务是从CLIP-seq数据中确定结合模型。这一点至关重要,因为CLIP-seq实验高度依赖于进行实验的细胞的转录状态。因此,仅依靠从不同细胞或条件下的CLIP-seq确定的结合位点可能会导致较高的假阴性率。然而,通过应用预测额外假定结合位点的模型,可以规避这一缺点。