2024.10.11 [24’] Towards Interpreting Visual Information Processing in Vision-Language Models Multimodal Interpretability
2024.10.11 [24’] Quadratic Is Not What You Need For Multimodal Large Language Models Multimodal Efficiency Pruning
2024.10.11 [24’] Intriguing Properties of Large Language and Vision Models Multimodal Interpretability
2024.10.07 [24’] PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model Multimodal Unified Segmentation
2024.10.07 [23’ EMNLP Findings] Text Augmented Spatial-aware Zero-shot Referring Image Segmentation Vision Referring Image Segmentation Training-free