--- license: openrail++ base_model: stabilityai/stable-diffusion-xl-base-1.0 tags: - stable-diffusion - stable-diffusion-xl - text-to-image - diffusers - tensorrt - tensorrt-rtx - nvidia - ampere - bf16 --- # SDXL TensorRT-RTX BF16 Ampere TensorRT-RTX optimized engines for Stable Diffusion XL on NVIDIA Ampere architecture (RTX 30 series, A100, etc.) with BF16 precision. ## Model Details - **Base Model**: stabilityai/stable-diffusion-xl-base-1.0 - **Architecture**: AMPERE (Compute Capability 8.6) - **Precision**: BF16 (16-bit brain floating point) - **TensorRT-RTX Version**: 1.0.0.21 - **Image Resolution**: 1024x1024 - **Batch Size**: 1 (static) ## Engine Files This repository contains 4 TensorRT engine files: - `clip.trt1.0.0.21.plan` - CLIP text encoder - `clip2.trt1.0.0.21.plan` - CLIP text encoder 2 - `unetxl.trt1.0.0.21.plan` - U-Net XL diffusion model - `vae.trt1.0.0.21.plan` - VAE decoder **Total Size**: 6.5GB ## Hardware Requirements - NVIDIA RTX 30 series (RTX 3060, 3070, 3080, 3090) or A100 - Compute Capability 8.6 - Minimum 12GB VRAM recommended - TensorRT-RTX 1.0.0.21 runtime ## Usage ```python # Example usage with TensorRT-RTX backend from imageai_server.shared.tensorrt_rtx_backend import TensorRTRTXBackend backend = TensorRTRTXBackend() backend.load_engines("path/to/engines") image = backend.generate("A beautiful sunset over mountains") ``` ## Performance - **Inference Speed**: ~2-3 seconds per image (RTX 3090) - **Memory Usage**: ~6-8GB VRAM - **Optimizations**: Static shapes, BF16 precision, Ampere-specific kernels ## License This model is released under the same license as the base SDXL model (OpenRAIL++). ## Built With - [TensorRT-RTX 1.0.0.21](https://developer.nvidia.com/tensorrt) - [NVIDIA Diffusion Demo](https://github.com/NVIDIA/TensorRT/tree/release/10.6/demo/Diffusion) - Built on NVIDIA GeForce RTX 3090 (Ampere 8.6)