---
license: apache-2.0
datasets:
- WeiChow/CrispEdit-2M
language:
- en
pipeline_tag: image-to-image
tags:
- image-edit
base_model:
- google/gemma-2-2b-it
---
# EditMGT

<div align="center">
  
[![arXiv](https://img.shields.io/badge/arXiv-XXXX.XXXXX-b31b1b.svg)]()
[![Dataset](https://img.shields.io/badge/🤗%20CrispEdit2M-Dataset-yellow)](https://huggingface.co/datasets/WeiChow/CrispEdit-2M)
[![Checkpoint](https://img.shields.io/badge/🧨%20EditMGT-CKPT-blue)](https://huggingface.co/WeiChow/EditMGT)
[![GitHub](https://img.shields.io/badge/GitHub-Repo-181717?logo=github)](https://github.com/weichow23/editmgt/tree/main)
[![Page](https://img.shields.io/badge/🏠%20Home-Page-b3.svg)](https://weichow23.github.io/editmgt/)
[![Python 3.9](https://img.shields.io/badge/Python-3.9-blue.svg?logo=python)](https://www.python.org/downloads/release/python-392/)

</div>

## 🌟 Overview

This is the official repository for **EditMGT: Unleashing the Potential of Masked Generative Transformer in Image Editing** ✨. 

EditMGT is a novel framework that leverages Masked Generative Transformers for advanced image editing tasks. Our approach enables precise and controllable image modifications while preserving original content integrity.

<p align="center">
  <img src="asset/editmgt.png" alt="EditMGT Architecture" width="800px">
</p>

## ✨ Features

- 🎨 Great style transfer capabilities
- 🔍 Attention control over editing regions
- ⚡ The model backbone is only 960M, resulting in fast inference speed.
- 📊 Trained on the [CrispEdit-2M]() dataset

## ⚡ Quick Start  

First, clone the repository and navigate to the project root:  
```shell
git clone https://github.com/weichow23/editmgt
cd editmgt
```

## 🔧 Environment Setup

```bash
# Create and activate conda environment
conda create --name editmgt python=3.9.2
conda activate editmgt

# Optional: Install system dependencies
sudo apt-get install libgl1-mesa-glx libglib2.0-0 -y

# Install Python dependencies
pip3 install git+https://github.com/openai/CLIP
pip3 install -r requirements.txt
```

⚠️ **Note**: If you encounter any strange environment library errors, please refer to [Issues](https://github.com/viiika/Meissonic/issues/14) to find the correct version that might fix the error.

## 🚀 Inference

Run the following script in the `editmgt` directory:

```python
import os
import sys
sys.path.append("./")
from PIL import Image
from src.editmgt import init_edit_mgt
from src.v2_model import negative_prompt

if __name__ == "__main__":
    pipe = init_edit_mgt(device='cuda:0')
    # Forcing the use of bf16 can improve speed, but it will incur a performance penalty. 
    # We noticed that GEditBench dropped by about 0.8.
    # pipe = init_edit_mgt(device, enable_bf16=False)

    # pipe.local_guidance=0.01 # After starting, it will use the local GS auxiliary mode.
    # pipe.local_query_text = 'owl' # Use specific words as attention queries
    # pipe.attention_enable_blocks = [i for i in range(28, 37)]  # attention layer used
    input_image = Image.open('assets/case_5.jspg')
    result = pipe(
        prompt=['Make it into Ghibli style'],
        height=1024,
        width=1024,
        num_inference_steps=36, # For some simple tasks, 16 steps are enough!
        guidance_scale=6,
        reference_strength=1.1,
        reference_image=[input_image.resize((1024, 1024))],
        negative_prompt=negative_prompt or None,
    )
    output_dir = "./output"
    os.makedirs(output_dir, exist_ok=True)

    file_path = os.path.join(output_dir, f"edited_case_5.png")
    w, h = input_image.size
    result.images[0].resize((w, h)).save(file_path)
```

## 📑 Citation

```bibtex

```

## 🙏 Acknowledgements

We extend our sincere gratitude to all contributors and the research community for their valuable feedback and support in the development of this project.