We find a commonality of various dirty samples is visual-linguistic inconsistency between images and associated labels. To capture the semantic inconsistency between modalities, we propose versatile ...
Abstract: It is not uncommon that real-world data are distributed with a long tail. For such data, the learning of deep neural networks becomes challenging because it is hard to classify tail classes ...
Abstract: Data tables are one of the most common ways in which people encounter data. Although mostly built with text and numbers, data tables have a spatial layout and often exhibit visual elements ...