Cream (Contrastive Reading Model) is a language-image understanding module designed to enhance the visually-situated natural language understanding capability in Large Language Models (LLMs). The ...