Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,42 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<div align="center">
|
| 2 |
+
<h2><center>👉 Ctrl-World: A Controllable Generative World Model for Robot Manipulation </h2>
|
| 3 |
+
|
| 4 |
+
[Yanjiang Guo*](https://robert-gyj.github.io), [Lucy Xiaoyang Shi*](https://lucys0.github.io), [Jianyu Chen](http://people.iiis.tsinghua.edu.cn/~jychen/), [Chelsea Finn](https://ai.stanford.edu/~cbfinn/)
|
| 5 |
+
|
| 6 |
+
\*Equal contribution; Stanford University, Tsinghua University
|
| 7 |
+
|
| 8 |
+
|
| 9 |
+
<a href='https://arxiv.org/abs/2510.10125'><img src='https://img.shields.io/badge/ArXiv-2510.10125-red'></a>
|
| 10 |
+
<a href='https://ctrl-world.github.io/'><img src='https://img.shields.io/badge/Project-Page-Blue'></a>
|
| 11 |
+
|
| 12 |
+
</div>
|
| 13 |
+
|
| 14 |
+
## TL; DR:
|
| 15 |
+
[**Ctrl-World**](https://sites.google.com/view/ctrl-world) is an action-conditioned world model compatible with modern VLA policies and enables policy-in-the-loop rollouts entirely in imagination, which can be used to evaluate and improve the **instruction following** ability of VLA.
|
| 16 |
+
|
| 17 |
+
<p>
|
| 18 |
+
<img src="assets/ctrl_world.jpg" alt="wild-data" width="100%" />
|
| 19 |
+
</p>
|
| 20 |
+
|
| 21 |
+
## Model Details:
|
| 22 |
+
This repo include the Ctrl-World model checkpoint trained on opensourced [**DROID dataset**](https://droid-dataset.github.io/) (~95k trajectories, 564 scenes).
|
| 23 |
+
The DROID platform consists of a Franka Panda robotic arm equipped with a Robotiq gripper and three cameras: two randomly placed third-person cameras and one wrist-mounted camera.
|
| 24 |
+
|
| 25 |
+
## Usage
|
| 26 |
+
See the official [**Ctrl-World github repo**](https://github.com/Robert-gyj/Ctrl-World/tree/main) for detailed usage.
|
| 27 |
+
|
| 28 |
+
## Acknowledgement
|
| 29 |
+
|
| 30 |
+
Ctrl-World is developed from the opensourced video foundation model [Stable-Video-Diffusion](https://github.com/Stability-AI/generative-models). The VLA model used in this repo is from [openpi](https://github.com/Physical-Intelligence/openpi). We thank the authors for their efforts!
|
| 31 |
+
|
| 32 |
+
|
| 33 |
+
## Bibtex
|
| 34 |
+
If you find our work helpful, please leave us a star and cite our paper. Thank you!
|
| 35 |
+
```
|
| 36 |
+
@article{guo2025ctrl,
|
| 37 |
+
title={Ctrl-World: A Controllable Generative World Model for Robot Manipulation},
|
| 38 |
+
author={Guo, Yanjiang and Shi, Lucy Xiaoyang and Chen, Jianyu and Finn, Chelsea},
|
| 39 |
+
journal={arXiv preprint arXiv:2510.10125},
|
| 40 |
+
year={2025}
|
| 41 |
+
}
|
| 42 |
+
```
|