MotionAgent: Fine-grained Controllable Video Generation via
Motion Field Agent
International Conference on Computer Vision, ICCV 2025.
1 Nanyang Technological Universityβ 2 StepFunβ 3 Westlake University
π§© Overview
MotionAgent is a novel framework that enables fine-grained motion control for text-guided image-to-video generation. At its core is a motion field agent that parses motion information in text prompts and converts it into explicit object trajectories and camera extrinsics. These motion representations are analytically integrated into a unified optical flow, which conditions a diffusion-based image-to-video model to generate videos with precise and flexible motion control. An optional rethinking step further refines motion alignment by iteratively correcting the agentβs previous actions.
π₯ Demo
Click the image above to watch the full video on YouTube π¬
π οΈ Dependencies and Installation
Follow the steps below to set up MotionAgent and run the demo smoothly π«
πΉ 1. Clone the Repository
Clone the official GitHub repository and enter the project directory:
git clone https://github.com/leoisufa/MotionAgent.git
cd MotionAgent
πΉ 2. Environment Setup
# Create and activate conda environment
conda create -n motionagent python==3.10 -y
conda activate motionagent
# Install PyTorch with CUDA 12.4 support
pip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cu124
# Install project dependencies
pip install -r requirements.txt
πΉ 3. Install Grounded-Segment-Anything Dependencies
MotionAgent relies on external segmentation and grounding models. Follow the steps below to install Grounded-Segment-Anything:
# Navigate to models directory
cd models
# Clone the Grounded-Segment-Anything repository
git clone https://github.com/IDEA-Research/Grounded-Segment-Anything.git
# Enter the cloned directory
cd Grounded-Segment-Anything
# Install Segment Anything
python -m pip install -e segment_anything
# Install Grounding DINO
pip install --no-build-isolation -e GroundingDINO
πΉ 4. Install Metric3D Dependencies
MotionAgent relies on an external monocular depth estimation model. Follow the steps below to install Metric3D:
# Navigate to models directory
cd models
# Clone the Grounded-Segment-Anything repository
git clone https://github.com/YvanYin/Metric3D.git
π§± Download Models
To run MotionAgent, please download all pretrained and auxiliary models listed below, and organize them under the ckpts/ directory as shown in the example structure.
1οΈβ£ Optical Flow ControlNet Weights
Download from π Hugging Face (MotionAgent) and place the files in ckpts.
2οΈβ£ Stable Video Diffusion
Download from π Hugging Face (MOFA-Video-Hybrid/stable-video-diffusion-img2vid-xt-1-1) and save the model to ckpts.
3οΈβ£ Grounding DINO
Download the grounding model checkpoint using the command below:
wget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth
Then place it directly under ckpts.
4οΈβ£ Segment Anything
Download the segmentation model using:
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
Then place it under ckpts.
5οΈβ£ Metric Depth Estimator
Download from π Hugging Face (Metric3d) and place the files in ckpts.
6οΈβ£ CMP
Download from π Hugging Face (MOFA-Video-Hybrid/cmp) and save the model to models/cmp/experiments/semiauto_annot/resnet50_vip+mpii_liteflow/checkpoints.
After all downloads and installations, your ckpts folder should look like this:
ckpts/
βββ controlnet/
βββ stable-video-diffusion-img2vid-xt-1-1/
βββ groundingdino_swint_ogc.pth
βββ metric_depth_vit_small_800k.pth
βββ sam_vit_h_4b8939.pth
π Running the Demos
python run_agent.py
π BibTeX
If you find MotionAgent useful for your research and applications, please cite using this BibTeX:
@article{liao2025motionagent,
title={Motionagent: Fine-grained controllable video generation via motion field agent},
author={Liao, Xinyao and Zeng, Xianfang and Wang, Liao and Yu, Gang and Lin, Guosheng and Zhang, Chi},
journal={arXiv preprint arXiv:2502.03207},
year={2025}
}
π Acknowledgements
We thank the following prior art for their excellent open source work:
- Downloads last month
- -