---
license: openrail++
base_model: stabilityai/stable-diffusion-xl-base-1.0
tags:

- stable-diffusion

- stable-diffusion-xl

- text-to-image

- diffusers

- tensorrt

- tensorrt-rtx

- nvidia

- ampere

- bf16

---

# SDXL TensorRT-RTX BF16 Ampere

TensorRT-RTX optimized engines for Stable Diffusion XL on NVIDIA Ampere architecture (RTX 30 series, A100, etc.) with BF16 precision.

## Model Details

- **Base Model**: stabilityai/stable-diffusion-xl-base-1.0
- **Architecture**: AMPERE (Compute Capability 8.6)
- **Precision**: BF16 (16-bit brain floating point)
- **TensorRT-RTX Version**: 1.0.0.21
- **Image Resolution**: 1024x1024
- **Batch Size**: 1 (static)

## Engine Files

This repository contains 4 TensorRT engine files:


- `clip.trt1.0.0.21.plan` - CLIP text encoder

- `clip2.trt1.0.0.21.plan` - CLIP text encoder 2

- `unetxl.trt1.0.0.21.plan` - U-Net XL diffusion model

- `vae.trt1.0.0.21.plan` - VAE decoder


**Total Size**: 6.5GB

## Hardware Requirements

- NVIDIA RTX 30 series (RTX 3060, 3070, 3080, 3090) or A100
- Compute Capability 8.6
- Minimum 12GB VRAM recommended
- TensorRT-RTX 1.0.0.21 runtime

## Usage

```python
# Example usage with TensorRT-RTX backend
from imageai_server.shared.tensorrt_rtx_backend import TensorRTRTXBackend

backend = TensorRTRTXBackend()
backend.load_engines("path/to/engines")
image = backend.generate("A beautiful sunset over mountains")
```

## Performance

- **Inference Speed**: ~2-3 seconds per image (RTX 3090)
- **Memory Usage**: ~6-8GB VRAM
- **Optimizations**: Static shapes, BF16 precision, Ampere-specific kernels

## License

This model is released under the same license as the base SDXL model (OpenRAIL++).

## Built With

- [TensorRT-RTX 1.0.0.21](https://developer.nvidia.com/tensorrt)
- [NVIDIA Diffusion Demo](https://github.com/NVIDIA/TensorRT/tree/release/10.6/demo/Diffusion)
- Built on NVIDIA GeForce RTX 3090 (Ampere 8.6)