2024.07.08 [21’ ICCV] Understanding Robustness of Transformers for Image Classification Vision Analysis ViT
2024.07.08 [23’ CVPR] ZegCLIP: Towards Adapting CLIP for Zero-shot Semantic Segmentation Multimodal Contrastive Learning Zero-shot Segmentation
2024.07.05 [24’ ICML] Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models Multimodal Analysis Instruction Tuning Visual Encoder
2024.07.05 [24’ CVPR] Osprey: Pixel Understanding with Visual Instruction Tuning Multimodal Instruction Tuning Visual Encoder Visual Perception