Collections

5

Nuclear Norm Regularization for Deep Learning

Paper • 2405.14544 • Published May 23, 2024 • 1
Token embeddings violate the manifold hypothesis

Paper • 2504.01002 • Published Apr 1 • 1
Approximate Nullspace Augmented Finetuning for Robust Vision Transformers

Paper • 2403.10476 • Published Mar 15, 2024 • 1
ElaLoRA: Elastic & Learnable Low-Rank Adaptation for Efficient Model Fine-Tuning

Paper • 2504.00254 • Published Mar 31 • 1

22

RL + Transformer = A General-Purpose Problem Solver

Paper • 2501.14176 • Published Jan 24 • 28
Towards General-Purpose Model-Free Reinforcement Learning

Paper • 2501.16142 • Published Jan 27 • 30
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published Jan 28 • 123
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization

Paper • 2412.12098 • Published Dec 16, 2024 • 4

-

Energy-Based Transformers are Scalable Learners and Thinkers

Paper • 2507.02092 • Published Jul 2 • 69

-

Energy-Based Transformers are Scalable Learners and Thinkers

Paper • 2507.02092 • Published Jul 2 • 69

-

Energy-Based Transformers are Scalable Learners and Thinkers

Paper • 2507.02092 • Published Jul 2 • 69

2

Nuclear Norm Regularization for Deep Learning

Token embeddings violate the manifold hypothesis

Approximate Nullspace Augmented Finetuning for Robust Vision Transformers

ElaLoRA: Elastic & Learnable Low-Rank Adaptation for Efficient Model Fine-Tuning

Energy-Based Transformers are Scalable Learners and Thinkers

Energy-Based Transformers are Scalable Learners and Thinkers

Byte Latent Transformer: Patches Scale Better Than Tokens

Causal Diffusion Transformers for Generative Modeling

Tensor Product Attention Is All You Need

TransMLA: Multi-head Latent Attention Is All You Need

RL + Transformer = A General-Purpose Problem Solver

Towards General-Purpose Model-Free Reinforcement Learning

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization

Energy-Based Transformers are Scalable Learners and Thinkers

MOSPA: Human Motion Generation Driven by Spatial Audio

Sound and Complete Neuro-symbolic Reasoning with LLM-Grounded Interpretations

Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling

Energy-Based Transformers are Scalable Learners and Thinkers

Nuclear Norm Regularization for Deep Learning

Token embeddings violate the manifold hypothesis

Approximate Nullspace Augmented Finetuning for Robust Vision Transformers

ElaLoRA: Elastic & Learnable Low-Rank Adaptation for Efficient Model Fine-Tuning

RL + Transformer = A General-Purpose Problem Solver

Towards General-Purpose Model-Free Reinforcement Learning

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization

Energy-Based Transformers are Scalable Learners and Thinkers

Energy-Based Transformers are Scalable Learners and Thinkers

MOSPA: Human Motion Generation Driven by Spatial Audio

Sound and Complete Neuro-symbolic Reasoning with LLM-Grounded Interpretations

Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling

Energy-Based Transformers are Scalable Learners and Thinkers

Energy-Based Transformers are Scalable Learners and Thinkers

Byte Latent Transformer: Patches Scale Better Than Tokens

Causal Diffusion Transformers for Generative Modeling

Tensor Product Attention Is All You Need

TransMLA: Multi-head Latent Attention Is All You Need