Visual relation detection (VRD) aims to describe images with relation triplets like <subject, predicate, object<, paying attention to the interaction between every two instances. To detect the visual relations that express the main content of a given image, visual relation of interest detection (VROID) is proposed as an extension of the traditional VRD task. The existing methods related to the general VRD task are mostly based on instance-level features and the methods that adopt detailed information only use part-level attention or human body parts. None of the existing methods take advantage of general semantic parts. Therefore, on the basis of the IPNet for VROID, we further propose an interest propagation form part (IPFP) method which propagates interest along “part-instance-pair-triplet” to detect visual relations of interest. The IPFP method consists of four modules, Panoptic Object-Part Detection (POPD) module, Part Interest Prediction (PartIP) module, Instance Interest Prediction (InstIP) module, Pair Interest Prediction (PairIP) module, and Predicate Interest Prediction (PredIP) module. The POPD module extracts instances with instance features and instance parts with part features; the PartIP module predicts interest for every single part; the InstIP module predicts interest for every single instance; the PairIP module predicts interest for each pair of instances; and the PredIP module predicts possible predicates for each instance pairs. The interest scores of visual relations are the product of pair interest scores and predicate possibilities for pairs. We evaluate the performance of the IPFP method and the effectiveness of important components using the ViROI dataset for VROID.
|