PRXPipeline

PRX: Open Text-to-Image Generative Model

PRX (Photoroom Experimental) is a 1.3-billion-parameter text-to-image model trained entirely from scratch and released under an Apache 2.0 license.

It is part of Photoroom’s broader effort to open-source the complete process behind training large-scale text-to-image models — covering architecture design, optimization strategies, and post-training alignment. The goal is to make PRX both a strong open baseline and a transparent research reference for those developing or studying diffusion-transformer models.

For more information, please read our announcement blog post.

Model description

PRX is designed to be lightweight yet capable, easy to fine-tune or extend, and fully open.

PRX generates high-quality images from text using a simplified MMDiT architecture where text tokens don’t update through transformer blocks. It uses flow matching with discrete scheduling for efficient sampling and Google’s T5-Gemma-2B-2B-UL2 model for multilingual text encoding. The model has around 1.3B parameters and delivers fast inference without sacrificing quality. You can choose between Flux VAE for balanced quality and speed, or DC-AE for higher latent compression and faster processing.

This card in particular describes Photoroom/prx-256-t2i, one of the PRX model variants:

Resolution: 256 pixels
Architecture: PRX (MMDiT-like diffusion transformer variant)
Latent backbone: Flux's VAE
Text encoder: T5-Gemma-2B-2B-UL2
Training stage: Base model
License: Apache 2.0

For other checkpoints, browse the full PRX collection.

Example usage

You can use PRX directly in Diffusers:

from diffusers.pipelines.prx import PRXPipeline

pipe = PRXPipeline.from_pretrained(
    "Photoroom/prx-256-t2i",
    torch_dtype=torch.bfloat16
).to("cuda")

prompt = "A front-facing portrait of a lion in the golden savanna at sunset"
image = pipe(prompt, num_inference_steps=28, guidance_scale=5.0).images[0]
image.save("lion.png")

Visual examples and demo

Here are some examples from one of our best checkpoints so far (Photoroom/prx-1024-t2i-beta).

PRX Demo on Hugging Face Spaces — interactive text-to-image demo for Photoroom/prx-1024-t2i-beta.

Training details

PRX models were trained from scratch using recent advances in diffusion and flow-matching training. We experimented with a range of modern techniques for efficiency, stability, and alignment, which we’ll cover in more detail in our upcoming series of research posts:

Part 0: Overview and release
Part 1: Design experiments and architecture benchmark (coming soon)
Part 2: Accelerating training (coming soon)
Part 3: Post-pretraining (coming soon)

Other PRX models

You can find additional checkpoints in the PRX collection:

Base – pretrained model before alignment; best starting point for fine-tuning or research
SFT — supervised fine-tuned model; produces more aesthetically pleasing, ready-to-use generations
Latent backbones — Flux's and DC-AE VAEs
Distilled – 8-step generation with LADD
Resolutions – 256, 512, and 1024 pixels

License

PRX is available under an Apache 2.0 license.

Use restrictions

You must not use PRX models for:

any of the restricted uses set forth in the Gemma Prohibited Use Policy;
or any activity that violates applicable laws or regulations.

Downloads last month: 32

Collection including Photoroom/prx-256-t2i

PRX

Collection

PRX text-to-image models • 9 items • Updated 3 days ago • 5