license: apache-2.0
pipeline_tag: text-to-video
library_name: diffusers
PSA: Pyramid Sparse Attention for Efficient Video Understanding and Generation
๐ Paper | ๐ Project Page | ๐ป Code
Official PyTorch implementation of PSA: Pyramid Sparse Attention for Efficient Video Understanding and Generation.
Visual comparison of sparse attention methods at similar sparsity levels (~90%). PSA maintains visual fidelity close to full attention while other methods show noticeable artifacts.
Pyramid Sparse Attention (PSA) is a versatile attention module designed to overcome the quadratic complexity bottleneck of attention mechanisms in foundation models. It introduces multi-level pooled Key-Value (KV) representations, enabling a finer mask granularity than traditional binary masking approaches. This design allows critical KV blocks to receive full resolution attention while less important blocks utilize progressively pooled representations, creating an informative interpolation between full retention and complete pruning. This approach effectively mitigates information loss and preserves computational efficiency. PSA is applicable to both video understanding and generation tasks, consistently outperforming or achieving comparable performance to existing sparse attention baselines with superior efficiency-quality trade-offs.
Note: This release focuses on inference-only with bidirectional attention. Support for causal attention masks and backward propagation (training) is still under optimization and will be released in a future update.
Installation
Using uv (Recommended)
uv venv --python 3.11
source .venv/bin/activate
uv pip install -e .
Using pip
python -m venv .venv
source .venv/bin/activate
pip install -e .
For best performance, we recommend using PyTorch nightly version.
Download Weights
CogVideoX-5B LoRA (4-step)
huggingface-cli download GYP666/BLADE cogvideox-5b-psa-lora/pytorch_lora_weights.safetensors --local-dir ./weights
Note: After downloading, update the lora_path in examples/configs/model_configs.py to point to your weights directory.
Quick Start (Inference)
CogVideoX1.5-5B
python examples/inference/cogvideo/cogvideo_5b.py \
--model cogvideo1.5_5b \
--prompt "your prompt here" \
--use_psa
Wan2.1-1.3B
python examples/inference/wan21/wan21_1.3b.py \
--prompt "your prompt here" \
--use_psa --no_warmup
For more inference examples, see examples/README.md.
Citation
If you find this work useful, please cite our paper:
@misc{li2025psapyramidsparseattention,
title={PSA: Pyramid Sparse Attention for Efficient Video Understanding and Generation},
author={Xiaolong Li and Youping Gu and Xi Lin and Weijie Wang and Bohan Zhuang},
year={2025},
eprint={2512.04025},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2512.04025},
}