Abstract: Recently, audio-visual speech recognition has attracted increasing attention. However, most existing works only focused on scenarios with two speakers. In this work, we study the effect of ...
We find a commonality of various dirty samples is visual-linguistic inconsistency between images and associated labels. To capture the semantic inconsistency between modalities, we propose versatile ...
Abstract: Animals in nature exhibit remarkable spatial cognition abilities, enabling them to achieve long-distance autonomous navigation efficiently in unknown environments. Neurobiologically inspired ...