One Patch to Caption Them All: A Unified Zero-Shot Captioning Framework Paper • 2510.02898 • Published Oct 3 • 4
Maybe you are looking for CroQS: Cross-modal Query Suggestion for Text-to-Image Retrieval Paper • 2412.13834 • Published Dec 18, 2024
One Patch to Caption Them All: A Unified Zero-Shot Captioning Framework Paper • 2510.02898 • Published Oct 3 • 4
LoomNet: Enhancing Multi-View Image Generation via Latent Space Weaving Paper • 2507.05499 • Published Jul 7
Talking to DINO: Bridging Self-Supervised Vision Backbones with Language for Open-Vocabulary Segmentation Paper • 2411.19331 • Published Nov 28, 2024 • 5