Abstract: Exploiting fine-grained correspondence and visual-semantic alignments has shown great potential in image-text matching. Generally, recent approaches first employ a cross-modal attention unit ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results