Update README.md
Browse files
README.md
CHANGED
|
@@ -1,85 +1 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
This is the implementation of `Stable Diffusion 3 Inpainting Pipeline`.
|
| 4 |
-
|
| 5 |
-
| input image | input mask image | output |
|
| 6 |
-
|:-------------------------:|:-------------------------:|:-------------------------:|
|
| 7 |
-
|<img src="./overture-creations-5sI6fQgYIuo.png" width = "400" /> | <img src="./overture-creations-5sI6fQgYIuo_mask.png" width = "400" /> | <img src="./overture-creations-5sI6fQgYIuo_output.jpg" width = "400" /> |
|
| 8 |
-
|<img src="./overture-creations-5sI6fQgYIuo.png" width = "400" /> | <img src="./overture-creations-5sI6fQgYIuo_mask.png" width = "400" /> | <img src="./overture-creations-5sI6fQgYIuo_tiger.jpg" width = "400" /> |
|
| 9 |
-
|<img src="./overture-creations-5sI6fQgYIuo.png" width = "400" /> | <img src="./overture-creations-5sI6fQgYIuo_mask.png" width = "400" /> | <img src="./overture-creations-5sI6fQgYIuo_panda.jpg" width = "400" /> |
|
| 10 |
-
|
| 11 |
-
**Please ensure that the version of diffusers >= 0.29.1**
|
| 12 |
-
|
| 13 |
-
## Model
|
| 14 |
-
|
| 15 |
-
[Stable Diffusion 3 Medium](https://stability.ai/news/stable-diffusion-3-medium) is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features greatly improved performance in image quality, typography, complex prompt understanding, and resource-efficiency.
|
| 16 |
-
|
| 17 |
-
For more technical details, please refer to the [Research paper](https://stability.ai/news/stable-diffusion-3-research-paper).
|
| 18 |
-
|
| 19 |
-
Please note: this model is released under the Stability Non-Commercial Research Community License. For a Creator License or an Enterprise License visit Stability.ai or [contact us](https://stability.ai/license) for commercial licensing details.
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
### Model Description
|
| 23 |
-
|
| 24 |
-
- **Developed by:** Stability AI
|
| 25 |
-
- **Model type:** MMDiT text-to-image generative model
|
| 26 |
-
- **Model Description:** This is a model that can be used to generate images based on text prompts. It is a Multimodal Diffusion Transformer
|
| 27 |
-
(https://arxiv.org/abs/2403.03206) that uses three fixed, pretrained text encoders
|
| 28 |
-
([OpenCLIP-ViT/G](https://github.com/mlfoundations/open_clip), [CLIP-ViT/L](https://github.com/openai/CLIP/tree/main) and [T5-xxl](https://huggingface.co/google/t5-v1_1-xxl))
|
| 29 |
-
|
| 30 |
-
## Demo
|
| 31 |
-
|
| 32 |
-
Make sure you upgrade to the latest version of diffusers: pip install -U diffusers. And then you can run:
|
| 33 |
-
|
| 34 |
-
```python
|
| 35 |
-
import torch
|
| 36 |
-
from torchvision import transforms
|
| 37 |
-
|
| 38 |
-
from pipeline_stable_diffusion_3_inpaint import StableDiffusion3InpaintPipeline
|
| 39 |
-
from diffusers.utils import load_image
|
| 40 |
-
|
| 41 |
-
def preprocess_image(image):
|
| 42 |
-
image = image.convert("RGB")
|
| 43 |
-
image = transforms.CenterCrop((image.size[1] // 64 * 64, image.size[0] // 64 * 64))(image)
|
| 44 |
-
image = transforms.ToTensor()(image)
|
| 45 |
-
image = image.unsqueeze(0).to("cuda")
|
| 46 |
-
return image
|
| 47 |
-
|
| 48 |
-
def preprocess_mask(mask):
|
| 49 |
-
mask = mask.convert("L")
|
| 50 |
-
mask = transforms.CenterCrop((mask.size[1] // 64 * 64, mask.size[0] // 64 * 64))(mask)
|
| 51 |
-
mask = transforms.ToTensor()(mask)
|
| 52 |
-
mask = mask.to("cuda")
|
| 53 |
-
return mask
|
| 54 |
-
|
| 55 |
-
pipe = StableDiffusion3InpaintPipeline.from_pretrained(
|
| 56 |
-
"stabilityai/stable-diffusion-3-medium-diffusers",
|
| 57 |
-
torch_dtype=torch.float16,
|
| 58 |
-
).to("cuda")
|
| 59 |
-
|
| 60 |
-
prompt = "Face of a yellow cat, high resolution, sitting on a park bench"
|
| 61 |
-
source_image = load_image(
|
| 62 |
-
"./overture-creations-5sI6fQgYIuo.png"
|
| 63 |
-
)
|
| 64 |
-
source = preprocess_image(source_image)
|
| 65 |
-
mask = preprocess_mask(
|
| 66 |
-
load_image(
|
| 67 |
-
"./overture-creations-5sI6fQgYIuo_mask.png"
|
| 68 |
-
)
|
| 69 |
-
)
|
| 70 |
-
|
| 71 |
-
image = pipe(
|
| 72 |
-
prompt=prompt,
|
| 73 |
-
image=source,
|
| 74 |
-
mask_image=mask,
|
| 75 |
-
height=1024,
|
| 76 |
-
width=1024,
|
| 77 |
-
num_inference_steps=50,
|
| 78 |
-
guidance_scale=7.0,
|
| 79 |
-
strength=0.6,
|
| 80 |
-
).images[0]
|
| 81 |
-
|
| 82 |
-
image.save("overture-creations-5sI6fQgYIuo_output.jpg")
|
| 83 |
-
```
|
| 84 |
-
|
| 85 |
-
|
|
|
|
| 1 |
+
pip install diffusers transformers torch accelerate safetensors
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|