Papers
arXiv:2510.27258

Higher-order Linear Attention

Published on Oct 31
· Submitted by Yifan Zhang on Nov 3
Authors:
,

Abstract

Higher-order Linear Attention (HLA) is a scalable, causal, and efficient mechanism for long-context autoregressive language models, combining attention-like mixing with recurrent architecture efficiency.

AI-generated summary

The quadratic cost of scaled dot-product attention is a central obstacle to scaling autoregressive language models to long contexts. Linear-time attention and State Space Models (SSMs) provide scalable alternatives but are typically restricted to first-order or kernel-based approximations, which can limit expressivity. We introduce Higher-order Linear Attention (HLA), a causal, streaming mechanism that realizes higher interactions via compact prefix sufficient statistics. In the second-order case, HLA maintains a constant-size state and computes per-token outputs in linear time without materializing any n times n matrices. We give closed-form streaming identities, a strictly causal masked variant using two additional summaries, and a chunk-parallel training scheme based on associative scans that reproduces the activations of a serial recurrence exactly. We further outline extensions to third and higher orders. Collectively, these results position HLA as a principled, scalable building block that combines attention-like, data-dependent mixing with the efficiency of modern recurrent architectures. Project Page: https://github.com/yifanzhang-pro/HLA.

Community

Paper author Paper submitter

The quadratic cost of scaled dot-product attention is a central obstacle to scaling autoregressive
language models to long contexts. Linear-time attention and State Space Models (SSMs) provide
scalable alternatives but are typically restricted to first-order or kernel-based approximations,
which can limit expressivity. We introduce Higher-order Linear Attention (HLA), a causal,
streaming mechanism that realizes higher interactions via compact prefix sufficient statistics. In
the second-order case, HLA maintains a constant-size state and computes per-token outputs in
linear time without materializing any n×n matrices
.

Project Page: https://github.com/yifanzhang-pro/HLA

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2510.27258 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2510.27258 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2510.27258 in a Space README.md to link it from this page.

Collections including this paper 2