2024.07.01 [24’ ICLR] Ferret: Refer and Ground Anything Anywhere at Any Granularity Multimodal Detection Visual Grounding
2024.06.28 [22’ ECCV] BioViL: Making the Most of Text Semantics to Improve Biomedical Vision-Language Processing Medical Contrastive Learning
2024.06.28 [23’] BiomedCLIP: A Multimodal Biomedical Foundation Model Pretrained from Fifteen Million Scientific Image-text Pairs Medical Contrastive Learning
2024.06.27 [21’] PubMedCLIP: Does CLIP Benefit Visual Question Answering in the Medical Domain as Much as it Does in the General Domain? Medical Contrastive Learning Dataset