HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels Paper • 2507.21809 • Published Jul 29, 2025 • 136
The Invisible Leash: Why RLVR May Not Escape Its Origin Paper • 2507.14843 • Published Jul 20, 2025 • 85
Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination Paper • 2507.10532 • Published Jul 14, 2025 • 89
Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling Paper • 2507.07982 • Published Jul 10, 2025 • 33
Vision-Language-Vision Auto-Encoder: Scalable Knowledge Distillation from Diffusion Models Paper • 2507.07104 • Published Jul 9, 2025 • 45
KV Cache Steering for Inducing Reasoning in Small Language Models Paper • 2507.08799 • Published Jul 11, 2025 • 40
OmniGen2: Exploration to Advanced Multimodal Generation Paper • 2506.18871 • Published Jun 23, 2025 • 78
VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory Paper • 2506.18903 • Published Jun 23, 2025 • 22
PAROAttention: Pattern-Aware ReOrdering for Efficient Sparse and Quantized Attention in Visual Generation Models Paper • 2506.16054 • Published Jun 19, 2025 • 60
V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning Paper • 2506.09985 • Published Jun 11, 2025 • 29
Discrete Diffusion in Large Language and Multimodal Models: A Survey Paper • 2506.13759 • Published Jun 16, 2025 • 43
Wait, We Don't Need to "Wait"! Removing Thinking Tokens Improves Reasoning Efficiency Paper • 2506.08343 • Published Jun 10, 2025 • 54