arxiv:2512.13492

Transform Trained Transformer: Accelerating Naive 4K Video Generation Over 10times

Published on Dec 15

Authors:

Abstract

The T3-Video model enhances 4K video generation efficiency and quality through a multi-scale weight-sharing window attention mechanism and hierarchical blocking.

AI-generated summary

Native 4K (2160times3840) video generation remains a critical challenge due to the quadratic computational explosion of full-attention as spatiotemporal resolution increases, making it difficult for models to strike a balance between efficiency and quality. This paper proposes a novel Transformer retrofit strategy termed T3 (Transform Trained Transformer) that, without altering the core architecture of full-attention pretrained models, significantly reduces compute requirements by optimizing their forward logic. Specifically, T3-Video introduces a multi-scale weight-sharing window attention mechanism and, via hierarchical blocking together with an axis-preserving full-attention design, can effect an "attention pattern" transformation of a pretrained model using only modest compute and data. Results on 4K-VBench show that T3-Video substantially outperforms existing approaches: while delivering performance improvements (+4.29uparrow VQA and +0.08uparrow VTC), it accelerates native 4K video generation by more than 10times. Project page at https://zhangzjn.github.io/projects/T3-Video

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2512.13492 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2512.13492 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.