HiKER-SGG: Hierarchical Knowledge Enhanced Robust Scene Graph Generation

Highlights

We propose HiKER-SGG, a novel method for generating scene graphs through a hierarchical inference approach over structured domain knowledge, allowing it to gradually specify increasingly granular classifications through iterative sub-selection.
We introduce a new synthetic VG-C benchmark for SGG, containing 20 challenging image corruptions, including simple transformations and severe weather conditions.
Extensive experiments demonstrate that HiKER-SGG outperforms current state-of-the-art methods on SGG tasks, while simultaneously providing a strong zero-shot baseline for generating scene graphs from corrupted images.

Abstract

Being able to understand visual scenes is a precursor for many downstream tasks, including autonomous driving, robotics, and other vision-based approaches. A common approach enabling the ability to reason over visual data is Scene Graph Generation (SGG); however, many existing approaches assume undisturbed vision, i.e., the absence of real-world corruptions such as fog, snow, smoke, as well as non-uniform perturbations like sun glare or water drops. In this work, we propose a novel SGG benchmark containing procedurally generated weather corruptions and other transformations over the Visual Genome dataset. Further, we introduce a corresponding approach, Hierarchical Knowledge Enhanced Robust Scene Graph Generation (HiKER-SGG), providing a strong baseline for scene graph generation under such challenging setting. At its core, HiKER-SGG utilizes a hierarchical knowledge graph in order to refine its predictions from coarse initial estimates to detailed predictions. In our extensive experiments, we show that HiKER-SGG does not only demonstrate superior performance on corrupted images in a zero-shot manner, but also outperforms current state-of-the-art methods on uncorrupted SGG tasks.

VG-C Benchmark

To standardize and evaluate SGG robustness, we create a corrupted Visual Genome (VG-C) benchmark, which comprises 20 corruption types designed to simulate realistic corruptions that may occur in real-world scenarios. Specifically, the first 15 types of corruption introduced by ImageNet-C are widely recognized as standard benchmarks for evaluating robustness. To further align with real-world scenarios, we introduce 5 additional types of natural corruption to our evaluation: sun glare, water-drop, wildfire smoke, rain, and dust.

fail — **Figure 3. All the 20 corruption types we used in our corrupted experiments.** The first 15 types of corruption are introduced by ImageNet-C, and we introduce 5 additional types of natural corruptions for a more comprehensive and practical evaluation.

Experimental Results

Results on Clean VG Dataset

Results on Corrupted VG-C Dataset

Qualitative Comparison

BibTeX

@inproceedings{zhang2024hiker,
  title={HiKER-SGG: Hierarchical Knowledge Enhanced Robust Scene Graph Generation},
  author={Zhang, Ce and Stepputtis, Simon and Campbell, Joseph and Sycara, Katia and Xie, Yaqi},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={28233--28243},
  year={2024}
}