用于多方向目标检测的水平边界框上的滑动顶点

Gliding Vertex on the Horizontal Bounding Box for Multi-Oriented Object Detection.

作者信息

Xu Yongchao, Fu Mingtao, Wang Qimeng, Wang Yukang, Chen Kai, Xia Gui-Song, Bai Xiang

出版信息

IEEE Trans Pattern Anal Mach Intell. 2021 Apr;43(4):1452-1459. doi: 10.1109/TPAMI.2020.2974745. Epub 2021 Mar 5.

DOI:10.1109/TPAMI.2020.2974745

PMID:32086194

Abstract

Object detection has recently experienced substantial progress. Yet, the widely adopted horizontal bounding box representation is not appropriate for ubiquitous oriented objects such as objects in aerial images and scene texts. In this paper, we propose a simple yet effective framework to detect multi-oriented objects. Instead of directly regressing the four vertices, we glide the vertex of the horizontal bounding box on each corresponding side to accurately describe a multi-oriented object. Specifically, We regress four length ratios characterizing the relative gliding offset on each corresponding side. This may facilitate the offset learning and avoid the confusion issue of sequential label points for oriented objects. To further remedy the confusion issue for nearly horizontal objects, we also introduce an obliquity factor based on area ratio between the object and its horizontal bounding box, guiding the selection of horizontal or oriented detection for each object. We add these five extra target variables to the regression head of faster R-CNN, which requires ignorable extra computation time. Extensive experimental results demonstrate that without bells and whistles, the proposed method achieves superior performances on multiple multi-oriented object detection benchmarks including object detection in aerial images, scene text detection, pedestrian detection in fisheye images.

摘要

目标检测最近取得了显著进展。然而，广泛采用的水平边界框表示法并不适用于诸如航拍图像和场景文本中的物体等普遍存在的有方向的物体。在本文中，我们提出了一个简单而有效的框架来检测多方向物体。我们不是直接回归四个顶点，而是在水平边界框的每个对应边上滑动顶点，以准确描述一个多方向物体。具体来说，我们回归四个长度比，以表征在每个对应边上的相对滑动偏移。这可能有助于偏移学习，并避免有方向物体的顺序标签点的混淆问题。为了进一步解决近水平物体的混淆问题，我们还引入了一个基于物体与其水平边界框面积比的倾斜因子，指导对每个物体进行水平或有方向检测的选择。我们将这五个额外的目标变量添加到更快的R-CNN的回归头中，这只需要可忽略不计的额外计算时间。大量实验结果表明，在没有花里胡哨的东西的情况下，所提出的方法在多个多方向物体检测基准上取得了优异的性能，包括航拍图像中的目标检测、场景文本检测、鱼眼图像中的行人检测。