2024.09.03 [24’ ECCV] FastV: An Image is Worth 1/2 Tokens After Layer 2 Multimodal Attention Efficiency
2024.08.29 [24’ ICLR] StreamingLLM: Efficient Streaming Language Models with Attention Sinks Language Attention Efficiency
2024.08.29 [24’ ICLR] PASTA: Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs Language Attention
2024.08.27 [24’] HiRED: Attention-Guided Token Dropping for Efficient Inference of High-Resolution Vision-Language Models in Resource-Constrained Environments Multimodal Attention Efficiency