2024.07.05 [24’] ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models Multimodal Visual Encoder
2024.07.05 [24’] Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs Multimodal Analysis Visual Encoder
2024.07.03 [24’] LaSagnA: Language-based Segmentation Assistant for Complex Queries Multimodal Referring Segmentation
2024.07.03 [24’ CVPR] Compositional Chain-of-Thought Prompting for Large Multimodal Models Multimodal Chain-of-Thought
2024.07.03 [24’ CVPR] AnyRef: Multi-modal Instruction Tuned LLMs with Fine-grained Visual Perception Multimodal Referring Segmentation Visual Grounding