Create README.md

#1
by ldsjmdy - opened
Files changed (1) hide show
  1. README.md +287 -0
README.md ADDED
@@ -0,0 +1,287 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - zwhe99/DeepMath-103K
4
+ base_model:
5
+ - openai/gpt-oss-20b
6
+ ---
7
+
8
+ # AutoDeco
9
+ Official Implementation of "[The End of Manual Decoding: Towards Truly End-to-End Language Models](https://arxiv.org/abs/2510.26697)"
10
+
11
+ **AutoDeco** is a framework that adds token-level adaptive decoding parameter prediction capabilities to Large Language Models (LLMs). By adding lightweight prediction heads on top of pre-trained models, AutoDeco can dynamically predict optimal temperature and top-p parameters for each token during decoding.
12
+
13
+ ## 🎯 Key Features
14
+
15
+ - **Token-Level Decoding Parameter Prediction**: Dynamically predict decoding parameters (temperature and top-p) for each generated token
16
+ - **Lightweight Design**: Only adds two small MLP prediction heads (~5MB), without modifying the base model
17
+ - **Universal Architecture**: Supports multiple mainstream LLM architectures (Llama, Qwen2/2.5, Qwen3, MoE models, etc.)
18
+ - **End-to-End Training**: Complete training with implicit gradient backpropagation through cross-entropy loss only
19
+ - **Flexible Training**: Supports independent training of temperature head, top-p head, or joint training
20
+ - **Efficient Deployment**: Only saves AutoDeco prediction head weights during training, merges with base model during decoding.
21
+
22
+ ## πŸ—οΈ Architecture
23
+
24
+ The AutoDeco framework consists of two core components:
25
+
26
+ ![AutoDeco Architecture](figure/arch.png)
27
+
28
+ ### Model Workflow
29
+
30
+ ```
31
+ Input Tokens
32
+ ↓
33
+ Base LLM (frozen during head training)
34
+ ↓
35
+ Hidden States
36
+ β”œβ”€β”€β†’ LM Head β†’ Logits
37
+ β”œβ”€β”€β†’ TempHead β†’ Temperature
38
+ └──→ TopPHead β†’ Top-P
39
+ ```
40
+
41
+ During training, the base LLM parameters are frozen, and only the two prediction heads are trained.
42
+
43
+ ## πŸ€– Supported Models
44
+
45
+ AutoDeco supports all current autoregressive LLMs, and we unified them with the following model architectures `AutoDecoModelForCausalLM` interface.
46
+
47
+
48
+
49
+ <div align="center">
50
+
51
+ | **Base Model** | **#Base Params** | **#AutoDeco Params** | **Download** |
52
+ | :------------: | :------------: | :------------: | :------------: |
53
+ | Llama-3.1-Nemotron-Nano-8B-v1 | 8B | 2.1M | [πŸ€— HuggingFace](https://huggingface.co/Jadeislaw/AutoDeco-Llama-Nemotron-8B) |
54
+ | DeepSeek-R1-Distill-Qwen-7B | 7B | 1.84M | [πŸ€— HuggingFace](https://huggingface.co/Jadeislaw/AutoDeco-R1-Distill-Qwen-7B) |
55
+ | Qwen3-30B-A3B-Instruct-2507 | 30B | 1.05M | [πŸ€— HuggingFace](https://huggingface.co/Jadeislaw/AutoDeco-Qwen3-30B-A3B-Instruct-2507) |
56
+ | OpenAI-GPT-OSS-20B | 20B | 1.48M | [πŸ€— HuggingFace](https://huggingface.co/Jadeislaw/AutoDeco-GPT-Oss-20B) |
57
+ | OpenAI-GPT-OSS-120B | 120B | - | Comming Soon |
58
+ | Qwen3-235B-A22B-Thinking | 235B | - | Comming Soon |
59
+ | DeepSeek-V3.1-Terminus | 671B | - | Comming Soon |
60
+
61
+ </div>
62
+
63
+
64
+
65
+ ## πŸš€ Installation
66
+
67
+ ### Recommended Requirements
68
+
69
+ - Python >= 3.10
70
+ - PyTorch >= 2.0
71
+ - CUDA >= 12.0 (recommended for training)
72
+
73
+ ### Install Dependencies
74
+
75
+ ```bash
76
+ # Clone repository
77
+ cd AutoDeco
78
+
79
+ # Install core dependencies
80
+ pip install -r requirements.txt
81
+
82
+ # Optional: for training monitoring
83
+ pip install wandb
84
+ ```
85
+
86
+ ## πŸ’‘ Quick Start
87
+
88
+ ### Initialize AutoDeco Model
89
+
90
+ ```python
91
+ python script/construct_autodeco.py \
92
+ --base_model_name_or_path path_to_your_base_LLM \
93
+ --output_dir path_to_your_AutoDeco_model
94
+ ```
95
+
96
+ <!-- ### 2. Inference
97
+
98
+ ```python
99
+ from transformers import AutoTokenizer
100
+
101
+ tokenizer = AutoTokenizer.from_pretrained("path/to/model")
102
+ inputs = tokenizer("What is the meaning of life?", return_tensors="pt")
103
+
104
+ # Forward pass to get predictions
105
+ outputs = model(**inputs)
106
+
107
+ # outputs contains:
108
+ # - outputs.logits: Regular language model logits
109
+ # - outputs.temp_logits: Predicted temperature values
110
+ # - outputs.top_p_logits: Predicted top-p values
111
+ ```
112
+
113
+ ### 3. Efficient Inference with vLLM
114
+
115
+ We have integrated AutoDeco with vLLM for efficient batch inference:
116
+
117
+ - Install vLLM from source code first
118
+ ```bash
119
+ cd vllm
120
+ pip install -e .
121
+ ```
122
+
123
+ - Inference
124
+ ```bash
125
+ # Use training script for evaluation
126
+ python llm_eval.py \
127
+ --model_name_or_path path/to/autodeco_model \
128
+ --dataset aime24 \
129
+ --temp 1.0 \
130
+ --top_p 1.0 \
131
+ --k 16 \
132
+ --tp_size 4
133
+ ``` -->
134
+
135
+ ## πŸ”₯ Training
136
+
137
+ ### Prepare Training Data
138
+
139
+ Training data should be in JSONL format, with one sample per line. AutoDeco supports standard conversation format:
140
+
141
+
142
+ ```bash
143
+ {
144
+ "prompt": "formatted prompt text",
145
+ "completion": "expected completion"
146
+ }
147
+
148
+ # example
149
+ {
150
+ "prompt": "<|im_start|>user\nEvaluate the limit:$$\\lim_{(x, y) \\to (1, 2)} \\frac{(x-1)(y-2)-x+3}{x^2-2x+y^2-4}$$\nMake sure you output the final answer within \\boxed{}<|im_end|>\n< im_start>assistant\n",
151
+ "completion": "......### βœ… Final Answer:\n$$\n\\boxed{-1}\n$$""
152
+ }
153
+ ```
154
+
155
+ ### Train AutoDeco Heads
156
+
157
+ Use the provided training script:
158
+
159
+ ```bash
160
+ # Edit script/trl_train.sh to configure parameters
161
+ # Key parameters:
162
+ # - MODEL_NAME_OR_PATH: Your initialized AutoDeco Model Path
163
+ # - DATA_NAME: Training data filename (in data directory)
164
+ # - MAX_LENGTH: Maximum sequence length
165
+ # - train_temp: Whether to train temperature head
166
+ # - train_top_p: Whether to train top-p head
167
+
168
+ bash script/trl_train.sh
169
+ ```
170
+
171
+ Training configuration examples:
172
+
173
+ ```bash
174
+ # Train only temperature head
175
+ accelerate launch trl_train.py \
176
+ --model_name_or_path AutoDeco-Llama-3.1-8B \
177
+ --dataset_name train_data.jsonl \
178
+ --train_temp true \
179
+ --train_top_p false \
180
+ --learning_rate 5e-6 \
181
+ --num_train_epochs 1 \
182
+ --output_dir ckpt/llama3_temp_head
183
+ ```
184
+
185
+ ## πŸ“Š Inference
186
+
187
+ ### Batch Evaluation with vLLM
188
+
189
+ ```bash
190
+ # Single evaluation
191
+ python llm_eval.py \
192
+ --model_name_or_path ckpt/autodeco_model \
193
+ --dataset aime24 \
194
+ --temp 1.0 \
195
+ --top_p 1.0 \
196
+ --k 16 \
197
+ --seed 42
198
+
199
+ # Batch evaluation with script (automatically generates multiple random seeds)
200
+ bash script/test_generation.sh aime24 1.0 1.0 -1 1.0 path/to/model
201
+ ```
202
+
203
+ Evaluation results are saved in the `generation_log/` directory, including:
204
+ - Pass@K metrics
205
+ - Average accuracy
206
+ - Detailed generation results for each sample
207
+
208
+ ### Deploy with vLLM
209
+ ```bash
210
+ # example
211
+ vllm serve
212
+ ```
213
+
214
+ ## πŸ“ Project Structure
215
+ ```
216
+ AutoDeco/
217
+ β”œβ”€β”€ model/ # Model definitions
218
+ β”‚ β”œβ”€β”€ templlm_auto.py # Unified AutoDeco model (recommended)
219
+ definitions
220
+ β”‚
221
+ β”œβ”€β”€ trainer/ # Trainers
222
+ β”‚ └── trl_Temp.py # AutoDeco trainer
223
+ β”‚
224
+ β”œβ”€β”€ script/ # Scripts
225
+ β”‚ β”œβ”€β”€ trl_train.sh # Training launch script
226
+ β”‚ β”œβ”€β”€ test_generation.sh # Batch evaluation script
227
+ β”‚ └── merge_autodeco.py # Merge or split heads
228
+ β”‚
229
+ β”œβ”€β”€ config/ # Configuration files
230
+ β”‚ └── deepspeed/ # DeepSpeed configuration
231
+ β”‚ └── deepspeed_zero3_gradaccu4.yaml
232
+ β”‚
233
+ β”œβ”€β”€ trl_train.py # Training main program
234
+ β”œβ”€β”€ llm_eval.py # Evaluation main program (vLLM)
235
+ β”œβ”€β”€ boxed_extract.py # Answer extraction tool
236
+ β”œβ”€β”€ requirements.txt # requirements
237
+ └── README.md # This document
238
+
239
+ ```
240
+
241
+ ## πŸ”§ Advanced Usage
242
+
243
+ ### 1. Extract AutoDeco Heads from AutoDeco Model
244
+
245
+ ```python
246
+ python merge_autodeco.py split \
247
+ --full-checkpoint path_to_your_full_model \
248
+ --output path_to_split_head
249
+ ```
250
+
251
+ This generates a lightweight checkpoint (~5MB) containing:
252
+ - `config.json`: AutoDeco configuration (including base_model_name_or_path)
253
+ - `autodeco_heads.safetensors`: Heads weights
254
+
255
+ ### 2. Merge AutoDeco Heads to Base Model (for vLLM Deployment)
256
+
257
+ If you need to create a complete model file with heads for inference engines like vLLM:
258
+
259
+ ```python
260
+ python merge_autodeco.py merge \
261
+ --autodeco-path path_to_autodeco_heads \
262
+ --base-model-path path_to_base_LLM \
263
+ --output path_to_your_full_model
264
+ ```
265
+
266
+
267
+ ## πŸ“ Citation
268
+
269
+ If you use AutoDeco in your research, please cite:
270
+
271
+ ```bibtex
272
+ @misc{wang2025endmanualdecodingtruly,
273
+ title={The End of Manual Decoding: Towards Truly End-to-End Language Models},
274
+ author={Zhichao Wang and Dongyang Ma and Xinting Huang and Deng Cai and Tian Lan and Jiahao Xu and Haitao Mi and Xiaoying Tang and Yan Wang},
275
+ year={2025},
276
+ eprint={2510.26697},
277
+ archivePrefix={arXiv},
278
+ primaryClass={cs.CL},
279
+ url={https://arxiv.org/abs/2510.26697},
280
+ }
281
+ ```
282
+
283
+ <!-- ## Acknowledgments
284
+
285
+ - Built on [Transformers](https://github.com/huggingface/transformers) and [TRL](https://github.com/huggingface/trl)
286
+ - Training framework uses [DeepSpeed](https://github.com/microsoft/DeepSpeed)
287
+ - Inference optimization uses [vLLM](https://github.com/vllm-project/vllm) -->