HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning Paper • 2509.08519 • Published Sep 10 • 127
BannerAgency: Advertising Banner Design with Multimodal LLM Agents Paper • 2503.11060 • Published Mar 14 • 3
Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement Paper • 2503.06520 • Published Mar 9 • 11
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching Paper • 2503.05179 • Published Mar 7 • 46
Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs Paper • 2503.01743 • Published Mar 3 • 89