2024.06.26 [23’ NIPS] LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day Medical Dataset Multimodal
2024.06.26 [23’] Shikra: Unleashing Multimodal LLM’s Referential Dialogue Magic Multimodal Chain-of-Thought Detection Visual Grounding
2024.06.26 [24’] MMStar: Are We on the Right Way for Evaluating Large Vision-Language Models? Multimodal Benchmark Bias
2024.06.25 [24’ CVPR] VIVL: Towards Better Vision-Inspired Vision-Language Models Multimodal Adapter Prefix Tuning
2024.06.25 [24’] Visual CoT: Unleashing Chain-of-Thought Reasoning in Multi-Modal Language Models Multimodal Chain-of-Thought