HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning Paper • 2509.08519 • Published Sep 10 • 127
EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control Paper • 2508.21112 • Published Aug 28 • 75
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency Paper • 2508.18265 • Published Aug 25 • 205
Skywork-UniPic2 Collection A Unified DiT Multimodal Model for Image Generation, Editing, and Understanding • 8 items • Updated Aug 22 • 10
SVDQuant Collection Models and datasets for "SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models" • 20 items • Updated May 29 • 64
Skywork UniPic: Unified Autoregressive Modeling for Visual Understanding and Generation Paper • 2508.03320 • Published Aug 5 • 61
Skywork-UniPic Collection Unified Autoregressive Modeling for Visual Understanding and Generation • 2 items • Updated Aug 13 • 12
I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models Paper • 2502.10458 • Published Feb 12 • 37
Multimodal DeepResearcher: Generating Text-Chart Interleaved Reports From Scratch with Agentic Framework Paper • 2506.02454 • Published Jun 3 • 7
CSVQA: A Chinese Multimodal Benchmark for Evaluating STEM Reasoning Capabilities of VLMs Paper • 2505.24120 • Published May 30 • 49
Skywork-VL Reward: An Effective Reward Model for Multimodal Understanding and Reasoning Paper • 2505.07263 • Published May 12 • 30