Enhance model card with pipeline tag, paper, project, and code links (#1)
Browse files- Enhance model card with pipeline tag, paper, project, and code links (5c1476c7250111adf51f685c5f60a665371929ea)
Co-authored-by: Niels Rogge <[email protected]>
README.md
CHANGED
|
@@ -1,3 +1,87 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: cc-by-nc-sa-4.0
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: cc-by-nc-sa-4.0
|
| 3 |
+
pipeline_tag: image-to-image
|
| 4 |
+
---
|
| 5 |
+
|
| 6 |
+
# 🪶 MagicQuill V2: Precise and Interactive Image Editing with Layered Visual Cues
|
| 7 |
+
|
| 8 |
+
- **Paper:** [MagicQuillV2: Precise and Interactive Image Editing with Layered Visual Cues](https://huggingface.co/papers/2512.03046)
|
| 9 |
+
- **Project Page:** https://magicquill.art/v2/
|
| 10 |
+
- **Code Repository:** https://github.com/zliucz/MagicQuillV2
|
| 11 |
+
- **Hugging Face Spaces Demo:** https://huggingface.co/spaces/AI4Editing/MagicQuillV2
|
| 12 |
+
|
| 13 |
+
<br>
|
| 14 |
+
|
| 15 |
+
<div align="center">
|
| 16 |
+
<video src="https://github.com/user-attachments/assets/58079152-7729-48ed-9bb4-0ddfd1873dd0" width="100%" controls autoplay muted loop></video>
|
| 17 |
+
</div>
|
| 18 |
+
|
| 19 |
+
<br>
|
| 20 |
+
|
| 21 |
+
**TLDR:** MagicQuill V2 introduces a layered composition paradigm to generative image editing, disentangling creative intent into controllable visual cues (Content, Spatial, Structural, Color) for precise and intuitive control.
|
| 22 |
+
|
| 23 |
+
## Hardware Requirements
|
| 24 |
+
|
| 25 |
+
Our model is based on Flux Kontext, which is large and computationally intensive.
|
| 26 |
+
- **VRAM**: Approximately **40GB** of VRAM is required for inference.
|
| 27 |
+
- **Speed**: It takes about **30 seconds** to generate a single image.
|
| 28 |
+
|
| 29 |
+
> **Important**: This is a research project focused on pushing the boundaries of interactive image editing. If you do not have sufficient GPU memory, we recommend checking out our [**MagicQuill V1**](https://github.com/ant-research/MagicQuill) or trying the online demo on [**Hugging Face Spaces**](https://huggingface.co/spaces/AI4Editing/MagicQuillV2).
|
| 30 |
+
|
| 31 |
+
## Setup
|
| 32 |
+
|
| 33 |
+
1. **Clone the repository**
|
| 34 |
+
```bash
|
| 35 |
+
git clone https://github.com/magic-quill/MagicQuillV2.git
|
| 36 |
+
cd MagicQuillV2
|
| 37 |
+
```
|
| 38 |
+
|
| 39 |
+
2. **Create environment**
|
| 40 |
+
```bash
|
| 41 |
+
conda create -n MagicQuillV2 python=3.10 -y
|
| 42 |
+
conda activate MagicQuillV2
|
| 43 |
+
```
|
| 44 |
+
|
| 45 |
+
3. **Install dependencies**
|
| 46 |
+
```bash
|
| 47 |
+
pip install -r requirements.txt
|
| 48 |
+
```
|
| 49 |
+
|
| 50 |
+
4. **Download models**
|
| 51 |
+
Download the models from [Hugging Face](https://huggingface.co/LiuZichen/MagicQuillV2-models) and place them in the `models/` directory.
|
| 52 |
+
|
| 53 |
+
```bash
|
| 54 |
+
huggingface-cli download LiuZichen/MagicQuillV2-models --local-dir models
|
| 55 |
+
```
|
| 56 |
+
|
| 57 |
+
5. **Run the demo**
|
| 58 |
+
```bash
|
| 59 |
+
python app.py
|
| 60 |
+
```
|
| 61 |
+
|
| 62 |
+
## System Overview
|
| 63 |
+
|
| 64 |
+
The MagicQuill V2 interactive system is designed to unify our layered composition framework.
|
| 65 |
+
|
| 66 |
+
<div align="center">
|
| 67 |
+
<img src="https://github.com/zliucz/MagicQuillV2/raw/main/assets/V2_UI.png" alt="MagicQuill V2 UI" width="100%">
|
| 68 |
+
</div>
|
| 69 |
+
|
| 70 |
+
### Key Upgrades from V1
|
| 71 |
+
|
| 72 |
+
1. **Toolbar (A)**: Features a new **Local Edit Brush** for defining the target editing area, along with tools for sketching edges and applying color.
|
| 73 |
+
2. **Visual Cue Manager (B)**: Holds all content layer visual cues (**foreground props**) that users can drag onto the canvas to define what to generate.
|
| 74 |
+
3. **Image Segmentation Panel (C)**: Accessed via the segment icon, this panel allows precise object extraction using SAM (Segment Anything Model) with positive/negative dots or bounding boxes.
|
| 75 |
+
|
| 76 |
+
## Citation
|
| 77 |
+
|
| 78 |
+
If you find MagicQuill V2 useful for your research, please cite our paper:
|
| 79 |
+
|
| 80 |
+
```bibtex
|
| 81 |
+
@article{liu2025magicquillv2,
|
| 82 |
+
title={MagicQuill V2: Precise and Interactive Image Editing with Layered Visual Cues},
|
| 83 |
+
author={Zichen Liu, Yue Yu, Hao Ouyang, Qiuyu Wang, Shuailei Ma, Ka Leong Cheng, Wen Wang, Qingyan Bai, Yuxuan Zhang, Yanhong Zeng, Yixuan Li, Xing Zhu, Yujun Shen, Qifeng Chen},
|
| 84 |
+
journal={arXiv:2512.03046},
|
| 85 |
+
year={2025}
|
| 86 |
+
}
|
| 87 |
+
```
|