BLADE / README.md

Update model card for PSA: Pyramid Sparse Attention

13eaa41 verified 14 days ago

3.44 kB

	---
	license: apache-2.0
	pipeline_tag: text-to-video
	library_name: diffusers
	---

	# PSA: Pyramid Sparse Attention for Efficient Video Understanding and Generation

	[📖 Paper](https://huggingface.co/papers/2512.04025) \| [🚀 Project Page](http://ziplab.co/PSA) \| [💻 Code](https://github.com/ziplab/Pyramid-Sparse-Attention)

	Official PyTorch implementation of [PSA: Pyramid Sparse Attention for Efficient Video Understanding and Generation](https://huggingface.co/papers/2512.04025).

	<p align="center">
	<img src="https://github.com/ziplab/Pyramid-Sparse-Attention/raw/main/figures/prompt007comparison.jpg" width="100%">
	</p>

	<p align="center"><em>Visual comparison of sparse attention methods at similar sparsity levels (~90%). PSA maintains visual fidelity close to full attention while other methods show noticeable artifacts.</em></p>

	Pyramid Sparse Attention (PSA) is a versatile attention module designed to overcome the quadratic complexity bottleneck of attention mechanisms in foundation models. It introduces multi-level pooled Key-Value (KV) representations, enabling a finer mask granularity than traditional binary masking approaches. This design allows critical KV blocks to receive full resolution attention while less important blocks utilize progressively pooled representations, creating an informative interpolation between full retention and complete pruning. This approach effectively mitigates information loss and preserves computational efficiency. PSA is applicable to both video understanding and generation tasks, consistently outperforming or achieving comparable performance to existing sparse attention baselines with superior efficiency-quality trade-offs.

	> Note: This release focuses on inference-only with bidirectional attention. Support for causal attention masks and backward propagation (training) is still under optimization and will be released in a future update.

	## Installation

	### Using uv (Recommended)

	```bash
	uv venv --python 3.11
	source .venv/bin/activate
	uv pip install -e .
	```

	### Using pip

	```bash
	python -m venv .venv
	source .venv/bin/activate
	pip install -e .
	```

	> For best performance, we recommend using PyTorch nightly version.

	## Download Weights

	### CogVideoX-5B LoRA (4-step)

	```bash
	huggingface-cli download GYP666/BLADE cogvideox-5b-psa-lora/pytorch_lora_weights.safetensors --local-dir ./weights
	```

	Note: After downloading, update the `lora_path` in `examples/configs/model_configs.py` to point to your weights directory.

	## Quick Start (Inference)

	### CogVideoX1.5-5B

	```bash
	python examples/inference/cogvideo/cogvideo_5b.py \
	--model cogvideo1.5_5b \
	--prompt "your prompt here" \
	--use_psa
	```

	### Wan2.1-1.3B

	```bash
	python examples/inference/wan21/wan21_1.3b.py \
	--prompt "your prompt here" \
	--use_psa --no_warmup
	```

	For more inference examples, see [examples/README.md](https://github.com/ziplab/Pyramid-Sparse-Attention/blob/main/examples/README.md).

	## Citation

	If you find this work useful, please cite our paper:

	```bibtex
	@misc{li2025psapyramidsparseattention,
	title={PSA: Pyramid Sparse Attention for Efficient Video Understanding and Generation},
	author={Xiaolong Li and Youping Gu and Xi Lin and Weijie Wang and Bohan Zhuang},
	year={2025},
	eprint={2512.04025},
	archivePrefix={arXiv},
	primaryClass={cs.CV},
	url={https://arxiv.org/abs/2512.04025},
	}
	```