--- license: apache-2.0 datasets: - WeiChow/CrispEdit-2M language: - en pipeline_tag: image-to-image tags: - image-edit base_model: - google/gemma-2-2b-it --- # EditMGT
[![arXiv](https://img.shields.io/badge/arXiv-XXXX.XXXXX-b31b1b.svg)]() [![Dataset](https://img.shields.io/badge/๐Ÿค—%20CrispEdit2M-Dataset-yellow)](https://huggingface.co/datasets/WeiChow/CrispEdit-2M) [![Checkpoint](https://img.shields.io/badge/๐Ÿงจ%20EditMGT-CKPT-blue)](https://huggingface.co/WeiChow/EditMGT) [![GitHub](https://img.shields.io/badge/GitHub-Repo-181717?logo=github)](https://github.com/weichow23/editmgt/tree/main) [![Page](https://img.shields.io/badge/๐Ÿ %20Home-Page-b3.svg)](https://weichow23.github.io/editmgt/) [![Python 3.9](https://img.shields.io/badge/Python-3.9-blue.svg?logo=python)](https://www.python.org/downloads/release/python-392/)
## ๐ŸŒŸ Overview This is the official repository for **EditMGT: Unleashing the Potential of Masked Generative Transformer in Image Editing** โœจ. EditMGT is a novel framework that leverages Masked Generative Transformers for advanced image editing tasks. Our approach enables precise and controllable image modifications while preserving original content integrity.

EditMGT Architecture

## โœจ Features - ๐ŸŽจ Great style transfer capabilities - ๐Ÿ” Attention control over editing regions - โšก The model backbone is only 960M, resulting in fast inference speed. - ๐Ÿ“Š Trained on the [CrispEdit-2M]() dataset ## โšก Quick Start First, clone the repository and navigate to the project root: ```shell git clone https://github.com/weichow23/editmgt cd editmgt ``` ## ๐Ÿ”ง Environment Setup ```bash # Create and activate conda environment conda create --name editmgt python=3.9.2 conda activate editmgt # Optional: Install system dependencies sudo apt-get install libgl1-mesa-glx libglib2.0-0 -y # Install Python dependencies pip3 install git+https://github.com/openai/CLIP pip3 install -r requirements.txt ``` โš ๏ธ **Note**: If you encounter any strange environment library errors, please refer to [Issues](https://github.com/viiika/Meissonic/issues/14) to find the correct version that might fix the error. ## ๐Ÿš€ Inference Run the following script in the `editmgt` directory: ```python import os import sys sys.path.append("./") from PIL import Image from src.editmgt import init_edit_mgt from src.v2_model import negative_prompt if __name__ == "__main__": pipe = init_edit_mgt(device='cuda:0') # Forcing the use of bf16 can improve speed, but it will incur a performance penalty. # We noticed that GEditBench dropped by about 0.8. # pipe = init_edit_mgt(device, enable_bf16=False) # pipe.local_guidance=0.01 # After starting, it will use the local GS auxiliary mode. # pipe.local_query_text = 'owl' # Use specific words as attention queries # pipe.attention_enable_blocks = [i for i in range(28, 37)] # attention layer used input_image = Image.open('assets/case_5.jspg') result = pipe( prompt=['Make it into Ghibli style'], height=1024, width=1024, num_inference_steps=36, # For some simple tasks, 16 steps are enough! guidance_scale=6, reference_strength=1.1, reference_image=[input_image.resize((1024, 1024))], negative_prompt=negative_prompt or None, ) output_dir = "./output" os.makedirs(output_dir, exist_ok=True) file_path = os.path.join(output_dir, f"edited_case_5.png") w, h = input_image.size result.images[0].resize((w, h)).save(file_path) ``` ## ๐Ÿ“‘ Citation ```bibtex ``` ## ๐Ÿ™ Acknowledgements We extend our sincere gratitude to all contributors and the research community for their valuable feedback and support in the development of this project.