2024.07.02 [24’] F-LMM: Grounding Frozen Large Multimodal Models Multimodal Chain-of-Thought Segmentation Visual Grounding
2024.07.01 [24’ ICLR] KOSMOS-2: Grounding Multimodal Large Language Models to the World Multimodal Visual Grounding
2024.07.01 [23’ NIPS] KOSMOS-1: Language Is Not All You Need: Aligning Perception with Language Models Multimodal Chain-of-Thought Foundation Model In-context Learning
2024.07.01 [24’ CVPR] GROUNDHOG: Grounding Large Language Models to Holistic Segmentation Multimodal Panoptic Segmentation Segmentation Visual Grounding
2024.07.01 [24’] GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest Multimodal Instruction Tuning Visual Grounding