Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,107 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
pipeline_tag: text-to-image
|
| 4 |
+
---
|
| 5 |
+
# Chroma1-Base
|
| 6 |
+
|
| 7 |
+
Chroma1-Base is an **8.9B** parameter text-to-image foundational model based on **FLUX.1-schnell**. It is fully **Apache 2.0 licensed**, ensuring that anyone can use, modify, and build upon it.
|
| 8 |
+
|
| 9 |
+
As a **base model**, Chroma1 is intentionally designed to be an excellent starting point for **finetuning**. It provides a strong, neutral foundation for developers, researchers, and artists to create specialized models.
|
| 10 |
+
|
| 11 |
+
for the fast CFG "baked" version please go to [Chroma1-Flash](https://huggingface.co/lodestones/Chroma1-Flash).
|
| 12 |
+
|
| 13 |
+
### Key Features
|
| 14 |
+
* **High-Performance Base:** 8.9B parameters, built on the powerful FLUX.1 architecture.
|
| 15 |
+
* **Easily Finetunable:** Designed as an ideal checkpoint for creating custom, specialized models.
|
| 16 |
+
* **Community-Driven & Open-Source:** Fully transparent with an Apache 2.0 license, and training history.
|
| 17 |
+
* **Flexible by Design:** Provides a flexible foundation for a wide range of generative tasks.
|
| 18 |
+
|
| 19 |
+
## Special Thanks
|
| 20 |
+
A massive thank you to our supporters who make this project possible.
|
| 21 |
+
* **Anonymous donor** whose incredible generosity funded the pretraining run and data collections. Your support has been transformative for open-source AI.
|
| 22 |
+
* **Fictional.ai** for their fantastic support and for helping push the boundaries of open-source AI. You can try Chroma on their platform:
|
| 23 |
+
|
| 24 |
+
[](https://fictional.ai/?ref=chroma_hf)
|
| 25 |
+
|
| 26 |
+
## How to Use
|
| 27 |
+
|
| 28 |
+
### `diffusers` Library
|
| 29 |
+
|
| 30 |
+
```python
|
| 31 |
+
import torch
|
| 32 |
+
from diffusers import ChromaPipeline
|
| 33 |
+
|
| 34 |
+
pipe = ChromaPipeline.from_pretrained("lodestones/Chroma1-Base", torch_dtype=torch.bfloat16)
|
| 35 |
+
pipe.enable_model_cpu_offload()
|
| 36 |
+
|
| 37 |
+
prompt = [
|
| 38 |
+
"A high-fashion close-up portrait of a blonde woman in clear sunglasses. The image uses a bold teal and red color split for dramatic lighting. The background is a simple teal-green. The photo is sharp and well-composed, and is designed for viewing with anaglyph 3D glasses for optimal effect. It looks professionally done."
|
| 39 |
+
]
|
| 40 |
+
negative_prompt = ["low quality, ugly, unfinished, out of focus, deformed, disfigure, blurry, smudged, restricted palette, flat colors"]
|
| 41 |
+
|
| 42 |
+
image = pipe(
|
| 43 |
+
prompt=prompt,
|
| 44 |
+
negative_prompt=negative_prompt,
|
| 45 |
+
generator=torch.Generator("cpu").manual_seed(433),
|
| 46 |
+
num_inference_steps=40,
|
| 47 |
+
guidance_scale=3.0,
|
| 48 |
+
num_images_per_prompt=1,
|
| 49 |
+
).images[0]
|
| 50 |
+
image.save("chroma.png")
|
| 51 |
+
```
|
| 52 |
+
ComfyUI
|
| 53 |
+
For advanced users and customized workflows, you can use Chroma with ComfyUI.
|
| 54 |
+
|
| 55 |
+
**Requirements:**
|
| 56 |
+
* A working ComfyUI installation.
|
| 57 |
+
* [Chroma checkpoint](https://huggingface.co/lodestones/Chroma) (latest version).
|
| 58 |
+
* [T5 XXL Text Encoder](https://huggingface.co/comfyanonymous/flux_text_encoders/resolve/main/t5xxl_fp16.safetensors).
|
| 59 |
+
* [FLUX VAE](https://huggingface.co/lodestones/Chroma/resolve/main/ae.safetensors).
|
| 60 |
+
* [Chroma Workflow JSON](https://huggingface.co/lodestones/Chroma/resolve/main/ChromaSimpleWorkflow20250507.json).
|
| 61 |
+
|
| 62 |
+
**Setup:**
|
| 63 |
+
1. Place the `T5_xxl` model in your `ComfyUI/models/clip` folder.
|
| 64 |
+
2. Place the `FLUX VAE` in your `ComfyUI/models/vae` folder.
|
| 65 |
+
3. Place the `Chroma checkpoint` in your `ComfyUI/models/diffusion_models` folder.
|
| 66 |
+
4. Load the Chroma workflow file into ComfyUI and run.
|
| 67 |
+
|
| 68 |
+
## Model Details
|
| 69 |
+
* **Architecture:** Based on the 8.9B parameter FLUX.1-schnell model.
|
| 70 |
+
* **Training Data:** Trained on a 5M sample dataset curated from a 20M pool, including artistic, photographic, and niche styles.
|
| 71 |
+
* **Technical Report:** A comprehensive technical paper detailing the architectural modifications and training process is forthcoming.
|
| 72 |
+
|
| 73 |
+
## Intended Use
|
| 74 |
+
Chroma is intended to be used as a **base model** for researchers and developers to build upon. It is ideal for:
|
| 75 |
+
* Finetuning on specific styles, concepts, or characters.
|
| 76 |
+
* Research into generative model behavior, alignment, and safety.
|
| 77 |
+
* As a foundational component in larger AI systems.
|
| 78 |
+
|
| 79 |
+
## Limitations and Bias Statement
|
| 80 |
+
Chroma is trained on a broad, filtered dataset from the internet. As such, it may reflect the biases and stereotypes present in its training data. The model is released in a state as is and has not been aligned with a specific safety filter.
|
| 81 |
+
|
| 82 |
+
Users are responsible for their own use of this model. It has the potential to generate content that may be considered harmful, explicit, or offensive. I encourage developers to implement appropriate safeguards and ethical considerations in their downstream applications.
|
| 83 |
+
|
| 84 |
+
## Summary of Architectural Modifications
|
| 85 |
+
*(For a full breakdown, tech report soon-ish.)*
|
| 86 |
+
|
| 87 |
+
* **12B → 8.9B Parameters:**
|
| 88 |
+
* **TL;DR:** I replaced a 3.3B parameter timestep-encoding layer with a more efficient 250M parameter FFN, as the original was vastly oversized for its task.
|
| 89 |
+
* **MMDiT Masking:**
|
| 90 |
+
* **TL;DR:** Masking T5 padding tokens enhanced fidelity and increased training stability by preventing the model from focusing on irrelevant `<pad>` tokens.
|
| 91 |
+
* **Custom Timestep Distributions:**
|
| 92 |
+
* **TL;DR:** I implemented a custom timestep sampling distribution (`-x^2`) to prevent loss spikes and ensure the model trains effectively on both high-noise and low-noise regions.
|
| 93 |
+
|
| 94 |
+
## P.S
|
| 95 |
+
Chroma1-Base is Chroma-v.48
|
| 96 |
+
|
| 97 |
+
## Citation
|
| 98 |
+
```
|
| 99 |
+
@misc{rock2025chroma,
|
| 100 |
+
author = {Lodestone Rock},
|
| 101 |
+
title = {Chroma1-Base},
|
| 102 |
+
year = {2025},
|
| 103 |
+
publisher = {Hugging Face},
|
| 104 |
+
journal = {Hugging Face repository},
|
| 105 |
+
howpublished = {\url{https://huggingface.co/lodestones/Chroma1-Base}},
|
| 106 |
+
}
|
| 107 |
+
```
|