2024.06.11 [24’ CVPR] LLaFS: When Large Language Models Meet Few-Shot Segmentation Multimodal Few-Shot Segmentation In-context Learning
2024.06.11 [23’] RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback Language RLAIF RLHF
2024.06.11 [24’ ICML] Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study Language Analysis DPO PPO RLHF
2024.06.11 [23’ NIPS] Direct Preference Optimization: Your Language Model is Secretly a Reward Model Language DPO RLHF
2024.06.10 [24’ CVPR] LLaVA-1.5: Improved Baselines with Visual Instruction Tuning Multimodal Adapter Instruction Tuning