Abstract: Vision-language tracking is a new rising topic in intelligent transportation systems, particularly significant in autonomous driving and road surveillance. It is a task that aims to combine ...
CLIP for Unsupervised and Fully Supervised Visual Grounding. This repository is the official Pytorch implementation for the paper CLIP-VG: Self-paced Curriculum Adapting of CLIP for Visual Grounding.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results