PRX: Open Text-to-Image Generative Model

PRX

PRX (Photoroom Experimental) is a 1.3-billion-parameter text-to-image model trained entirely from scratch and released under an Apache 2.0 license.

It is part of Photoroom’s broader effort to open-source the complete process behind training large-scale text-to-image models — covering architecture design, optimization strategies, and post-training alignment. The goal is to make PRX both a strong open baseline and a transparent research reference for those developing or studying diffusion-transformer models.

For more information, please read our announcement blog post.

Model description

PRX is designed to be lightweight yet capable, easy to fine-tune or extend, and fully open.

PRX generates high-quality images from text using a simplified MMDiT architecture where text tokens don’t update through transformer blocks. It uses flow matching with discrete scheduling for efficient sampling and Google’s T5-Gemma-2B-2B-UL2 model for multilingual text encoding. The model has around 1.3B parameters and delivers fast inference without sacrificing quality. You can choose between Flux VAE for balanced quality and speed, or DC-AE for higher latent compression and faster processing.

This card in particular describes Photoroom/prx-256-t2i, one of the PRX model variants:

  • Resolution: 256 pixels
  • Architecture: PRX (MMDiT-like diffusion transformer variant)
  • Latent backbone: Flux's VAE
  • Text encoder: T5-Gemma-2B-2B-UL2
  • Training stage: Base model
  • License: Apache 2.0

For other checkpoints, browse the full PRX collection.

Example usage

You can use PRX directly in Diffusers:

from diffusers.pipelines.prx import PRXPipeline

pipe = PRXPipeline.from_pretrained(
    "Photoroom/prx-256-t2i",
    torch_dtype=torch.bfloat16
).to("cuda")

prompt = "A front-facing portrait of a lion in the golden savanna at sunset"
image = pipe(prompt, num_inference_steps=28, guidance_scale=5.0).images[0]
image.save("lion.png")

Visual examples and demo

Here are some examples from one of our best checkpoints so far (Photoroom/prx-1024-t2i-beta).

PRX Demo on Hugging Face Spaces — interactive text-to-image demo for Photoroom/prx-1024-t2i-beta.

Training details

PRX models were trained from scratch using recent advances in diffusion and flow-matching training. We experimented with a range of modern techniques for efficiency, stability, and alignment, which we’ll cover in more detail in our upcoming series of research posts:

  • Part 0: Overview and release
  • Part 1: Design experiments and architecture benchmark (coming soon)
  • Part 2: Accelerating training (coming soon)
  • Part 3: Post-pretraining (coming soon)

Other PRX models

You can find additional checkpoints in the PRX collection:

  • Base – pretrained model before alignment; best starting point for fine-tuning or research
  • SFT — supervised fine-tuned model; produces more aesthetically pleasing, ready-to-use generations
  • Latent backbones — Flux's and DC-AE VAEs
  • Distilled – 8-step generation with LADD
  • Resolutions – 256, 512, and 1024 pixels

License

PRX is available under an Apache 2.0 license.

Use restrictions

You must not use PRX models for:

  1. any of the restricted uses set forth in the Gemma Prohibited Use Policy;
  2. or any activity that violates applicable laws or regulations.
Downloads last month
32
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including Photoroom/prx-256-t2i