Referring Expression Comprehension as Scene Graph Grounding

Efficient object recognition with linguistic references using Graph Neural Networks and CLIP.

© Copyright 2024 Diego Calanzone. Powered by Jekyll. Last updated: November 27, 2024.