2024.09.10 [24’] xGen-MM (BLIP-3): A Family of Open Large Multimodal Models Multimodal Few-shot Learning Foundation Model
2024.09.10 [23’ ICML] BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models Multimodal Foundation Model
2024.09.04 [24’] LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models Multimodal Attention Efficiency
2024.09.04 [24’ ECCV] FlexAttention for Efficient High-Resolution Vision-Language Models Multimodal Attention Efficiency
2024.09.04 [24’ ACL] Spectral Filters, Dark Signals, and Attention Sinks Language Attention Interpretability