Abstract: The Audio-Visual Event Localization (AVEL) task aims to temporally locate and classify video events that are both audible and visible. Most research in this field assumes a closed-set ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results