-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
Collections
Discover the best community collections!
Collections including paper arxiv:2507.17744
-
HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels
Paper • 2507.21809 • Published • 132 -
Yume: An Interactive World Generation Model
Paper • 2507.17744 • Published • 85 -
Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models
Paper • 2507.13344 • Published • 57 -
ComfyUI-Copilot: An Intelligent Assistant for Automated Workflow Development
Paper • 2506.05010 • Published • 79
-
Yume: An Interactive World Generation Model
Paper • 2507.17744 • Published • 85 -
SSRL: Self-Search Reinforcement Learning
Paper • 2508.10874 • Published • 95 -
The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity
Paper • 2506.06941 • Published • 15 -
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
Paper • 2506.01939 • Published • 185
-
A Survey on Long-Video Storytelling Generation: Architectures, Consistency, and Cinematic Quality
Paper • 2507.07202 • Published • 24 -
StreamDiT: Real-Time Streaming Text-to-Video Generation
Paper • 2507.03745 • Published • 31 -
LongAnimation: Long Animation Generation with Dynamic Global-Local Memory
Paper • 2507.01945 • Published • 78 -
TokensGen: Harnessing Condensed Tokens for Long Video Generation
Paper • 2507.15728 • Published • 7
-
Omni-Effects: Unified and Spatially-Controllable Visual Effects Generation
Paper • 2508.07981 • Published • 58 -
CharacterShot: Controllable and Consistent 4D Character Animation
Paper • 2508.07409 • Published • 39 -
ToonComposer: Streamlining Cartoon Production with Generative Post-Keyframing
Paper • 2508.10881 • Published • 52 -
Puppeteer: Rig and Animate Your 3D Models
Paper • 2508.10898 • Published • 32
-
Arbitrary-steps Image Super-resolution via Diffusion Inversion
Paper • 2412.09013 • Published • 13 -
Deep Researcher with Test-Time Diffusion
Paper • 2507.16075 • Published • 66 -
nablaNABLA: Neighborhood Adaptive Block-Level Attention
Paper • 2507.13546 • Published • 123 -
Yume: An Interactive World Generation Model
Paper • 2507.17744 • Published • 85
-
AnimaX: Animating the Inanimate in 3D with Joint Video-Pose Diffusion Models
Paper • 2506.19851 • Published • 60 -
SeqTex: Generate Mesh Textures in Video Sequence
Paper • 2507.04285 • Published • 9 -
Yume: An Interactive World Generation Model
Paper • 2507.17744 • Published • 85 -
STream3R: Scalable Sequential 3D Reconstruction with Causal Transformer
Paper • 2508.10893 • Published • 31
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
-
Omni-Effects: Unified and Spatially-Controllable Visual Effects Generation
Paper • 2508.07981 • Published • 58 -
CharacterShot: Controllable and Consistent 4D Character Animation
Paper • 2508.07409 • Published • 39 -
ToonComposer: Streamlining Cartoon Production with Generative Post-Keyframing
Paper • 2508.10881 • Published • 52 -
Puppeteer: Rig and Animate Your 3D Models
Paper • 2508.10898 • Published • 32
-
HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels
Paper • 2507.21809 • Published • 132 -
Yume: An Interactive World Generation Model
Paper • 2507.17744 • Published • 85 -
Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models
Paper • 2507.13344 • Published • 57 -
ComfyUI-Copilot: An Intelligent Assistant for Automated Workflow Development
Paper • 2506.05010 • Published • 79
-
Arbitrary-steps Image Super-resolution via Diffusion Inversion
Paper • 2412.09013 • Published • 13 -
Deep Researcher with Test-Time Diffusion
Paper • 2507.16075 • Published • 66 -
nablaNABLA: Neighborhood Adaptive Block-Level Attention
Paper • 2507.13546 • Published • 123 -
Yume: An Interactive World Generation Model
Paper • 2507.17744 • Published • 85
-
Yume: An Interactive World Generation Model
Paper • 2507.17744 • Published • 85 -
SSRL: Self-Search Reinforcement Learning
Paper • 2508.10874 • Published • 95 -
The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity
Paper • 2506.06941 • Published • 15 -
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
Paper • 2506.01939 • Published • 185
-
A Survey on Long-Video Storytelling Generation: Architectures, Consistency, and Cinematic Quality
Paper • 2507.07202 • Published • 24 -
StreamDiT: Real-Time Streaming Text-to-Video Generation
Paper • 2507.03745 • Published • 31 -
LongAnimation: Long Animation Generation with Dynamic Global-Local Memory
Paper • 2507.01945 • Published • 78 -
TokensGen: Harnessing Condensed Tokens for Long Video Generation
Paper • 2507.15728 • Published • 7
-
AnimaX: Animating the Inanimate in 3D with Joint Video-Pose Diffusion Models
Paper • 2506.19851 • Published • 60 -
SeqTex: Generate Mesh Textures in Video Sequence
Paper • 2507.04285 • Published • 9 -
Yume: An Interactive World Generation Model
Paper • 2507.17744 • Published • 85 -
STream3R: Scalable Sequential 3D Reconstruction with Causal Transformer
Paper • 2508.10893 • Published • 31