BLADE / README.md
nielsr's picture
nielsr HF Staff
Update model card for PSA: Pyramid Sparse Attention
13eaa41 verified
|
raw
history blame
3.44 kB
metadata
license: apache-2.0
pipeline_tag: text-to-video
library_name: diffusers

PSA: Pyramid Sparse Attention for Efficient Video Understanding and Generation

๐Ÿ“– Paper | ๐Ÿš€ Project Page | ๐Ÿ’ป Code

Official PyTorch implementation of PSA: Pyramid Sparse Attention for Efficient Video Understanding and Generation.

Visual comparison of sparse attention methods at similar sparsity levels (~90%). PSA maintains visual fidelity close to full attention while other methods show noticeable artifacts.

Pyramid Sparse Attention (PSA) is a versatile attention module designed to overcome the quadratic complexity bottleneck of attention mechanisms in foundation models. It introduces multi-level pooled Key-Value (KV) representations, enabling a finer mask granularity than traditional binary masking approaches. This design allows critical KV blocks to receive full resolution attention while less important blocks utilize progressively pooled representations, creating an informative interpolation between full retention and complete pruning. This approach effectively mitigates information loss and preserves computational efficiency. PSA is applicable to both video understanding and generation tasks, consistently outperforming or achieving comparable performance to existing sparse attention baselines with superior efficiency-quality trade-offs.

Note: This release focuses on inference-only with bidirectional attention. Support for causal attention masks and backward propagation (training) is still under optimization and will be released in a future update.

Installation

Using uv (Recommended)

uv venv --python 3.11
source .venv/bin/activate
uv pip install -e .

Using pip

python -m venv .venv
source .venv/bin/activate
pip install -e .

For best performance, we recommend using PyTorch nightly version.

Download Weights

CogVideoX-5B LoRA (4-step)

huggingface-cli download GYP666/BLADE cogvideox-5b-psa-lora/pytorch_lora_weights.safetensors --local-dir ./weights

Note: After downloading, update the lora_path in examples/configs/model_configs.py to point to your weights directory.

Quick Start (Inference)

CogVideoX1.5-5B

python examples/inference/cogvideo/cogvideo_5b.py \
    --model cogvideo1.5_5b \
    --prompt "your prompt here" \
    --use_psa

Wan2.1-1.3B

python examples/inference/wan21/wan21_1.3b.py \
    --prompt "your prompt here" \
    --use_psa --no_warmup

For more inference examples, see examples/README.md.

Citation

If you find this work useful, please cite our paper:

@misc{li2025psapyramidsparseattention,
      title={PSA: Pyramid Sparse Attention for Efficient Video Understanding and Generation}, 
      author={Xiaolong Li and Youping Gu and Xi Lin and Weijie Wang and Bohan Zhuang},
      year={2025},
      eprint={2512.04025},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2512.04025}, 
}