Rui Zhao's picture

Rui Zhao

ruizhaocv

·

https://ruizhaocv.github.io/

AI & ML interests

Multimodal and GenAI

Recent Activity

upvoted a paper 5 days ago

VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation

authored a paper 6 days ago

UniLumos: Fast and Unified Image and Video Relighting with Physics-Plausible Feedback

upvoted a paper 6 days ago

UniLumos: Fast and Unified Image and Video Relighting with Physics-Plausible Feedback

View all activity

Organizations

upvoted a paper 5 days ago

VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation

Paper • 2511.02778 • Published 5 days ago • 95

upvoted a paper 6 days ago

UniLumos: Fast and Unified Image and Video Relighting with Physics-Plausible Feedback

Paper • 2511.01678 • Published 6 days ago • 33

upvoted a paper 27 days ago

QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs

Paper • 2510.11696 • Published 27 days ago • 173

upvoted 3 papers about 1 month ago

Paper2Video: Automatic Video Generation from Scientific Papers

Paper • 2510.05096 • Published Oct 6 • 111

See, Point, Fly: A Learning-Free VLM Framework for Universal Unmanned Aerial Navigation

Paper • 2509.22653 • Published Sep 26 • 23

LongLive: Real-time Interactive Long Video Generation

Paper • 2509.22622 • Published Sep 26 • 181

upvoted 2 papers about 2 months ago

Video models are zero-shot learners and reasoners

Paper • 2509.20328 • Published Sep 24 • 96

PhysCtrl: Generative Physics for Controllable and Physics-Grounded Video Generation

Paper • 2509.20358 • Published Sep 24 • 14

upvoted a paper 4 months ago

ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning

Paper • 2507.16815 • Published Jul 22 • 39

upvoted 2 papers 5 months ago

D-AR: Diffusion via Autoregressive Models

Paper • 2505.23660 • Published May 29 • 34

UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning

Paper • 2505.23380 • Published May 29 • 22

upvoted 5 papers 6 months ago

Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers

Paper • 2505.21497 • Published May 27 • 109

OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data

Paper • 2505.18445 • Published May 24 • 64

RePrompt: Reasoning-Augmented Reprompting for Text-to-Image Generation via Reinforcement Learning

Paper • 2505.17540 • Published May 23 • 7

Visual Planning: Let's Think Only with Images

Paper • 2505.11409 • Published May 16 • 56

Flow-GRPO: Training Flow Matching Models via Online RL

Paper • 2505.05470 • Published May 8 • 85

upvoted 4 papers 7 months ago

BookWorld: From Novels to Interactive Agent Societies for Creative Story Generation

Paper • 2504.14538 • Published Apr 20 • 30

Packing Input Frame Context in Next-Frame Prediction Models for Video Generation

Paper • 2504.12626 • Published Apr 17 • 51

One-Minute Video Generation with Test-Time Training

Paper • 2504.05298 • Published Apr 7 • 110

SkyReels-A2: Compose Anything in Video Diffusion Transformers

Paper • 2504.02436 • Published Apr 3 • 39