|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
- zh |
|
|
pipeline_tag: image-to-image |
|
|
library_name: transformers |
|
|
--- |
|
|
|
|
|
<div align="center"> |
|
|
<img src="assets/longcat-image_logo.svg" width="45%" alt="LongCat-Image" /> |
|
|
</div> |
|
|
<hr> |
|
|
|
|
|
<div align="center" style="line-height: 1;"> |
|
|
<a href='https://arxiv.org/pdf/2512.07584'><img src='https://img.shields.io/badge/Technical-Report-red'></a> |
|
|
<a href='https://github.com/meituan-longcat/LongCat-Image'><img src='https://img.shields.io/badge/GitHub-Code-black'></a> |
|
|
<a href='https://github.com/meituan-longcat/LongCat-Flash-Chat/blob/main/figures/wechat_official_accounts.png'><img src='https://img.shields.io/badge/WeChat-LongCat-brightgreen?logo=wechat&logoColor=white'></a> |
|
|
<a href='https://x.com/Meituan_LongCat'><img src='https://img.shields.io/badge/Twitter-LongCat-white?logo=x&logoColor=white'></a> |
|
|
</div> |
|
|
|
|
|
<div align="center" style="line-height: 1;"> |
|
|
|
|
|
[//]: # ( <a href='https://meituan-longcat.github.io/LongCat-Image/'><img src='https://img.shields.io/badge/Project-Page-green'></a>) |
|
|
<a href='https://huggingface.co/meituan-longcat/LongCat-Image'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-LongCat--Image-blue'></a> |
|
|
<a href='https://huggingface.co/meituan-longcat/LongCat-Image-Dev'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-LongCat--Image--Dev-blue'></a> |
|
|
<a href='https://huggingface.co/meituan-longcat/LongCat-Image-Edit'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-LongCat--Image--Edit-blue'></a> |
|
|
</div> |
|
|
|
|
|
|
|
|
## Introduction |
|
|
We introduce **LongCat-Image-Edit**, the image editing version of Longcat-Image. LongCat-Image-Edit supports bilingual (Chinese-English) editing, achieves state-of-the-art performance among open-source image editing models, delivering leading instruction-following and image quality with superior visual consistency. |
|
|
|
|
|
<div align="center"> |
|
|
<img src="assets/model_struct_edit.png" width="90%" alt="LongCat-Image-Edit model" /> |
|
|
</div> |
|
|
|
|
|
|
|
|
### Key Features |
|
|
- π **Superior Precise Editing**: LongCat-Image-Edit supports various editing tasks, such as global editing, local editing, text modification, and reference-guided editing. It has strong semantic understanding capabilities and can perform precise editing according to instructions. |
|
|
- π **Consistency Preservation**: LongCat-Image-Edit has strong consistency preservation capabilities, specifically scrutinizes whether attributes in non-edited regions, such as layout, texture, color tone, and subject identity, remain invariant unless targeted by the instruction, is well demonstrated in multi-turn editing. |
|
|
- π **Strong Benchmark Performance**: LongCat-Image-Edit achieves state-of-the-art (SOTA) performance in image editing tasks while significantly improving model inference efficiency, especially among open-source image editing models. |
|
|
|
|
|
## π¨ Showcase |
|
|
|
|
|
<div align="center"> |
|
|
<img src="assets/image_edit_gallery.jpg" width="90%" alt="LongCat-Image-Edit gallery." /> |
|
|
</div> |
|
|
|
|
|
|
|
|
## Quick Start |
|
|
|
|
|
[Hugging Face app](https://huggingface.co/spaces/anycoderapps/LongCat-Image-Edit) |
|
|
|
|
|
### Installation |
|
|
|
|
|
Clone the repo: |
|
|
|
|
|
```shell |
|
|
git clone --single-branch --branch main https://github.com/meituan-longcat/LongCat-Image |
|
|
cd LongCat-Image |
|
|
``` |
|
|
|
|
|
Install dependencies: |
|
|
|
|
|
```shell |
|
|
# create conda environment |
|
|
conda create -n longcat-image python=3.10 |
|
|
conda activate longcat-image |
|
|
|
|
|
# install other requirements |
|
|
pip install -r requirements.txt |
|
|
python setup.py develop |
|
|
``` |
|
|
|
|
|
### Run Image Editing |
|
|
|
|
|
> [!CAUTION] |
|
|
> **π Special Handling for Text Rendering** |
|
|
> |
|
|
> For both Text-to-Image and Image Editing tasks involving text generation, **you must enclose the target text within single or double quotation marks** (both English '...' / "..." and Chinese β...β / β...β styles are supported). |
|
|
> |
|
|
> **Reasoning:** The model utilizes a specialized **character-level encoding** strategy specifically for quoted content. Failure to use explicit quotation marks prevents this mechanism from triggering, which will severely compromise the text rendering capability. |
|
|
> |
|
|
```python |
|
|
import torch |
|
|
from PIL import Image |
|
|
from transformers import AutoProcessor |
|
|
from longcat_image.models import LongCatImageTransformer2DModel |
|
|
from longcat_image.pipelines import LongCatImageEditPipeline |
|
|
|
|
|
device = torch.device('cuda') |
|
|
checkpoint_dir = './weights/LongCat-Image-Edit' |
|
|
text_processor = AutoProcessor.from_pretrained( checkpoint_dir, subfolder = 'tokenizer' ) |
|
|
transformer = LongCatImageTransformer2DModel.from_pretrained( checkpoint_dir , subfolder = 'transformer', |
|
|
torch_dtype=torch.bfloat16, use_safetensors=True).to(device) |
|
|
|
|
|
pipe = LongCatImageEditPipeline.from_pretrained( |
|
|
checkpoint_dir, |
|
|
transformer=transformer, |
|
|
text_processor=text_processor, |
|
|
) |
|
|
# pipe.to(device, torch.bfloat16) # Uncomment for high VRAM devices (Faster inference) |
|
|
pipe.enable_model_cpu_offload() # Offload to CPU to save VRAM (Required ~19 GB); slower but prevents OOM |
|
|
|
|
|
generator = torch.Generator("cpu").manual_seed(43) |
|
|
img = Image.open('assets/test.png') |
|
|
prompt = 'ε°η«εζη' |
|
|
image = pipe( |
|
|
img, |
|
|
prompt, |
|
|
negative_prompt='', |
|
|
guidance_scale=4.5, |
|
|
num_inference_steps=50, |
|
|
num_images_per_prompt=1, |
|
|
generator=generator |
|
|
).images[0] |
|
|
|
|
|
image.save('./edit_example.png') |
|
|
|
|
|
``` |