The figure shows the general architecture of our proposed Gumbel Latent Typing module. Our BERT-SparseLT model can be continually pretrained from the BERT-base-uncased checkpoint on a single V100 GPU ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results