The figure shows the general architecture of our proposed Gumbel Latent Typing module. Our BERT-SparseLT model can be continually pretrained from the BERT-base-uncased checkpoint on a single V100 GPU ...