AutoDeco

Official Implementation of "The End of Manual Decoding: Towards Truly End-to-End Language Models"

AutoDeco is a framework that adds token-level adaptive decoding parameter prediction capabilities to Large Language Models (LLMs). By adding lightweight prediction heads on top of pre-trained models, AutoDeco can dynamically predict optimal temperature and top-p parameters for each token during decoding.

🎯 Key Features

  • Token-Level Decoding Parameter Prediction: Dynamically predict decoding parameters (temperature and top-p) for each generated token
  • Lightweight Design: Only adds two small MLP prediction heads (~5MB), without modifying the base model
  • Universal Architecture: Supports multiple mainstream LLM architectures (Llama, Qwen2/2.5, Qwen3, MoE models, etc.)
  • End-to-End Training: Complete training with implicit gradient backpropagation through cross-entropy loss only
  • Flexible Training: Supports independent training of temperature head, top-p head, or joint training
  • Efficient Deployment: Only saves AutoDeco prediction head weights during training, merges with base model during decoding.

πŸ—οΈ Architecture

The AutoDeco framework consists of two core components:

AutoDeco Architecture

Model Workflow

Input Tokens
    ↓
Base LLM (frozen during head training)
    ↓
Hidden States
    β”œβ”€β”€β†’ LM Head β†’ Logits
    β”œβ”€β”€β†’ TempHead β†’ Temperature
    └──→ TopPHead β†’ Top-P

During training, the base LLM parameters are frozen, and only the two prediction heads are trained.

πŸ€– Supported Models

AutoDeco supports all current autoregressive LLMs, and we unified them with the following model architectures AutoDecoModelForCausalLM interface.

Base Model #Base Params #AutoDeco Params Download
Llama-3.1-Nemotron-Nano-8B-v1 8B 2.1M πŸ€— HuggingFace
DeepSeek-R1-Distill-Qwen-7B 7B 1.84M πŸ€— HuggingFace
Qwen3-30B-A3B-Instruct-2507 30B 1.05M πŸ€— HuggingFace
OpenAI-GPT-OSS-20B 20B 1.48M πŸ€— HuggingFace
OpenAI-GPT-OSS-120B 120B 1.48M πŸ€— HuggingFace
Qwen3-235B-A22B-Thinking 235B 2.1M πŸ€— HuggingFace
DeepSeek-V3.1-Terminus 671B - Comming Soon

πŸš€ Installation

Recommended Requirements

  • Python >= 3.10
  • PyTorch >= 2.0
  • CUDA >= 12.0 (recommended for training)

Install Dependencies

# Clone repository
cd AutoDeco

# Install core dependencies
pip install -r requirements.txt

# Optional: for training monitoring
pip install wandb

πŸ’‘ Quick Start

Initialize AutoDeco Model

python script/construct_autodeco.py \
    --base_model_name_or_path path_to_your_base_LLM \
    --output_dir path_to_your_AutoDeco_model

πŸ”₯ Training

Prepare Training Data

Training data should be in JSONL format, with one sample per line. AutoDeco supports standard conversation format:

{
  "prompt": "formatted prompt text",
  "completion": "expected completion"
}

# example
{
  "prompt": "<|im_start|>user\nEvaluate the limit:$$\\lim_{(x, y) \\to (1, 2)} \\frac{(x-1)(y-2)-x+3}{x^2-2x+y^2-4}$$\nMake sure you output the final answer within \\boxed{}<|im_end|>\n< im_start>assistant\n",
  "completion": "......### βœ… Final Answer:\n$$\n\\boxed{-1}\n$$""
}

Train AutoDeco Heads

Use the provided training script:

# Edit script/trl_train.sh to configure parameters
# Key parameters:
# - MODEL_NAME_OR_PATH: Your initialized AutoDeco Model Path
# - DATA_NAME: Training data filename (in data directory)
# - MAX_LENGTH: Maximum sequence length
# - train_temp: Whether to train temperature head
# - train_top_p: Whether to train top-p head

bash script/trl_train.sh

Training configuration examples:

# Train only temperature head
accelerate launch trl_train.py \
    --model_name_or_path AutoDeco-Llama-3.1-8B \
    --dataset_name train_data.jsonl \
    --train_temp true \
    --train_top_p false \
    --learning_rate 5e-6 \
    --num_train_epochs 1 \
    --output_dir ckpt/llama3_temp_head

πŸ“Š Inference

Batch Evaluation with vLLM

# Single evaluation
python llm_eval.py \
    --model_name_or_path ckpt/autodeco_model \
    --dataset aime24 \
    --temp 1.0 \
    --top_p 1.0 \
    --k 16 \
    --seed 42

# Batch evaluation with script (automatically generates multiple random seeds)
bash script/test_generation.sh aime24 1.0 1.0 -1 1.0 path/to/model

Evaluation results are saved in the generation_log/ directory, including:

  • Pass@K metrics
  • Average accuracy
  • Detailed generation results for each sample

Deploy with vLLM

# example
vllm serve 

πŸ“ Project Structure

AutoDeco/
β”œβ”€β”€ model/                          # Model definitions
β”‚   β”œβ”€β”€ templlm_auto.py            # Unified AutoDeco model (recommended)
definitions
β”‚
β”œβ”€β”€ trainer/                        # Trainers
β”‚   └── trl_Temp.py                # AutoDeco trainer
β”‚
β”œβ”€β”€ script/                         # Scripts
β”‚   β”œβ”€β”€ trl_train.sh               # Training launch script
β”‚   β”œβ”€β”€ test_generation.sh         # Batch evaluation script
β”‚   └── merge_autodeco.py          # Merge or split heads
β”‚
β”œβ”€β”€ config/                         # Configuration files
β”‚   └── deepspeed/                 # DeepSpeed configuration
β”‚       └── deepspeed_zero3_gradaccu4.yaml
β”‚
β”œβ”€β”€ trl_train.py                   # Training main program
β”œβ”€β”€ llm_eval.py                    # Evaluation main program (vLLM)
β”œβ”€β”€ boxed_extract.py               # Answer extraction tool
β”œβ”€β”€ requirements.txt               # requirements
└── README.md                      # This document

πŸ”§ Advanced Usage

1. Extract AutoDeco Heads from AutoDeco Model

python merge_autodeco.py split \
    --full-checkpoint path_to_your_full_model \
    --output path_to_split_head

This generates a lightweight checkpoint (~5MB) containing:

  • config.json: AutoDeco configuration (including base_model_name_or_path)
  • autodeco_heads.safetensors: Heads weights

2. Merge AutoDeco Heads to Base Model (for vLLM Deployment)

If you need to create a complete model file with heads for inference engines like vLLM:

python merge_autodeco.py merge \
    --autodeco-path path_to_autodeco_heads \
    --base-model-path path_to_base_LLM \
    --output path_to_your_full_model

πŸ“ Citation

If you use AutoDeco in your research, please cite:

@misc{wang2025endmanualdecodingtruly,
      title={The End of Manual Decoding: Towards Truly End-to-End Language Models}, 
      author={Zhichao Wang and Dongyang Ma and Xinting Huang and Deng Cai and Tian Lan and Jiahao Xu and Haitao Mi and Xiaoying Tang and Yan Wang},
      year={2025},
      eprint={2510.26697},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2510.26697}, 
}
Downloads last month
23
Safetensors
Model size
1.05M params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for zacks917/AutoDeco-Qwen3-30B-A3B-Instruct-2507

Finetuned
(17)
this model

Dataset used to train zacks917/AutoDeco-Qwen3-30B-A3B-Instruct-2507

Collection including zacks917/AutoDeco-Qwen3-30B-A3B-Instruct-2507