2024.05.27 [22’ NIPS] Flamingo: a Visual Language Model for Few-Shot Learning Multimodal Foundation Model In-context Learning
2024.05.26 [21’ ICML] CLIP: Learning Transferable Visual Models From Natural Language Supervision Multimodal Contrastive Learning
2024.05.15 [21’ ICML] VL-T5: Unifying Vision-and-Language Tasks via Text Generation Multimodal Multi-task Learning