StereoPilot: Learning Unified and Efficient Stereo Conversion via Generative Priors

Guibao Shen1,3†, Yihua Du1, Wenhang Ge1,3*†, Jing He1, Chirui Chang3, Donghao Zhou4, Zhen Yang1, Luozhou Wang1, Xin Tao3, Ying-Cong Chen1,2‑

1HKUST(GZ), 2HKUST, 3Kling Team, Kuaishou Technology, 4CUHK

(*Equal contribution, †This work was conducted during the author's internship at Kling, ‑Corresponding author)

πŸ“– Introduction

TL;DR: We propose StereoPilot, an efficient feed-forward architecture that leverages pretrained video diffusion transformers to directly synthesize novel views, overcoming the limitations of Depth-Warp-Inpaint methods without iterative denoising. With a domain switcher and cycle consistency loss, it enables robust multi-format stereo conversion. We also introduce UniStereo, the first large-scale unified dataset featuring both parallel and converged stereo formats.

Watch the video

🎬 Click the image to view our showcase video

πŸ”₯ Updates

βš™οΈ Requirements

Our inference environment:

  • Python 3.12
  • CUDA 12.1
  • PyTorch 2.4.1
  • GPU: NVIDIA A800 (only ~23GB VRAM required)

πŸ› οΈ Installation

Step 1: Clone the repository

git clone https://github.com/KlingTeam/StereoPilot.git

cd StereoPilot

Step 2: Create conda environment

conda create -n StereoPilot python=3.12

conda activate StereoPilot

Step 3: Install dependencies

pip install -r requirements.txt

pip install flash-attn==2.7.4.post1 --no-build-isolation

Step 4: Download model checkpoints

Place the following files in the ckpt/ directory:

File Description
StereoPilot.safetensors StereoPilot model weights
Wan2.1-T2V-1.3B Base Wan2.1 model directory

Download StereoPilot.safetensor & Wan2.1-1.3B base model:

pip install "huggingface_hub[cli]"

huggingface-cli download KlingTeam/StereoPilot StereoPilot.safetensors --local-dir ./ckpt

huggingface-cli download Wan-AI/Wan2.1-T2V-1.3B --local-dir ./ckpt/Wan2.1-T2V-1.3B

πŸš€ Inference

Input Requirements

For each input video, you need:

  1. Video file (.mp4): Monocular video, 81 frames, 832Γ—480 resolution, 16fps
  2. Prompt file (.txt): Text description of the video content (same name as video)

Example (you can try the cases in the sample/ folder):

sample/
β”œβ”€β”€ my_video.mp4
└── my_video.txt   

Running Inference

Basic usage:

# Edit toml/infer.toml to customize model paths. If you followed the above steps, there is no need to change
python sample.py \
  --config toml/infer.toml \
  --input /path/to/input_video.mp4 \
  --output_folder /path/to/output \
  --device cuda:0

Using the example script:

bash sample.sh

Generate Stereo Visualization

After inference, you can generate Side-by-Side (SBS) and Red-Cyan anaglyph stereo videos for visualization:

python utils/stereo_video.py \
  --left /path/to/left_eye.mp4 \
  --right /path/to/right_eye.mp4 \

Output files:

Output Description Viewing Device
{name}_sbs.mp4 Side-by-Side stereo video VR Headset
{name}_anaglyph.mp4 Red-Cyan anaglyph stereo video 3D Glasses

πŸ“Š Dataset

We introduce UniStereo, the first large-scale unified stereo video dataset featuring both parallel and converged stereo formats.

UniStereo consists of two parts:

  • 3DMovie - Converged stereo format from 3D movies
  • Stereo4D - Parallel stereo format (coming soon)

For detailed data processing instructions, please refer to StereoPilot_Dataprocess.

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

🌟 Citation

If you find our work helpful, please consider citing:

@misc{shen2025stereopilot,
  title={StereoPilot: Learning Unified and Efficient Stereo Conversion via Generative Priors},
  author={Shen, Guibao and Du, Yihua and Ge, Wenhang and He, Jing and Chang, Chirui and Zhou, Donghao and Yang, Zhen and Wang, Luozhou and Tao, Xin and Chen, Ying-Cong},
  year={2025},
  eprint={2512.16915},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2512.16915}, 
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for KlingTeam/StereoPilot

Finetuned
(22)
this model