Abstract: Autoregressive generative models have shown their superiority in the fields of vision and language, such as high accuracy, high fidelity, and strong stability. Most existing methods are ...
Abstract: Current multimodal sensor trackers mainly rely on visual cues for target tracking. However, in challenging scenarios, the visual information acquired by multimodal sensors has limited ...