Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities Paper • 2505.02567 • Published May 5 • 80
UniFork: Exploring Modality Alignment for Unified Multimodal Understanding and Generation Paper • 2506.17202 • Published Jun 20 • 10
ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation Paper • 2506.18095 • Published Jun 22 • 66
A Survey on Vision-Language-Action Models: An Action Tokenization Perspective Paper • 2507.01925 • Published Jul 2 • 38
Pixels, Patterns, but No Poetry: To See The World like Humans Paper • 2507.16863 • Published Jul 21 • 68