File size: 8,234 Bytes
137b674 ae06222 137b674 ae06222 137b674 ae06222 137b674 ae06222 137b674 ae06222 137b674 ae06222 137b674 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 |
---
license: other
license_link: LICENSE
library_name: transformers
pipeline_tag: text-generation
datasets:
- amd/SAND-Post-Training-Dataset
language:
- en
base_model:
- Qwen/Qwen2.5-32B-Instruct
---
# State-of-the-art Large Reasoning Model Built Using Only Synthetic Data on AMD GPUs
<div align="center">
| [](https://arxiv.org/pdf/2507.20527) | [](https://huggingface.co/datasets/amd/SAND-Post-Training-Dataset) | [](https://github.com/AMD-AGI/sand-pipeline) | [](https://rocm.blogs.amd.com/artificial-intelligence/sand-math/README.html) |
| :---: | :---: | :---: | :---: |
</div>
## Model Summary
We introduce **SAND-Math-Qwen2.5-32B** and **SAND-MathScience-DeepSeek-Qwen32B**, state-of-the-art reasoning models in the 32B parameter range, built entirely using a synthetic data pipeline running on the **AMD ROCm™ stack** and **AMD Instinct™ MI325 GPUs**.
By prioritizing data difficulty along with quantity, we demonstrate that high-difficulty synthetic data can elevate prior-generation models to match or exceed modern proprietary models. `SAND-Math-Qwen2.5-32B` is fine-tuned from **Qwen2.5-32B-Instruct** on just **14k synthetic math samples**, achieving strong reasoning capabilities with minimal data outperforming other data distillation and post training approaches. `SAND-MathScience-DeepSeek-Qwen32B` is fine-tuned from **DeepSeek-R1-Distill-Qwen-32B** on a compact dataset of **27k samples** (15k Math + 12k Science), achieving a generational leap in performance that rivals **Qwen3-32B**.
We are releasing the models, datasets, and code to empower the community to build their own state-of-the-art reasoning models using AMD hardware.
## 📊 Benchmark Results
We conducted extensive experiments to validate that our pipeline yields superior results compared to models trained on significantly larger datasets.
### 1. Bridging the Generational Gap
Fine-tuning the Qwen2.5-based **DeepSeek-R1-Distill-Qwen-32B** on our mixed Math/Science dataset allows it to rival and even surpass the next-generation **Qwen3-32B** on key benchmarks.
| Model | AIME24 | AIME25 | MATH500 | GPQA |
| :--- | :---: | :---: | :---: | :---: |
| DeepSeek-Distilled-Qwen32B (Base) | 72.6 | 54.9 | 94.3 | 62.1 |
| EXAONE Deep 32B | 72.1 | 65.8 | 95.8 | 66.1 |
| Qwen3-32B (Thinking mode) | 81.4 | 72.9 | **97.0** | 68.4 |
| **SAND-MathScience-DeepSeek-Qwen32B (Ours)** | **83.85** | **78.33** | 93.85 | **68.72** |
### 2. Efficiency: Unlocking Reasoning with Less Data
Using only **14k synthetic math samples** and standard SFT (no RL), our approach outperforms models trained on datasets 5x to 50x larger.
| Model | Data Size | AIME24 | AIME25 | MATH500 | GPQA |
| :--- | :--- | :---: | :---: | :---: | :---: |
| Qwen2.5-32B-Instruct (Base) | - | 16.7 | 13.3 | 83.4 | 53.5 |
| DeepSeek-R1-Distill-Qwen-32B | 800k | 72.6 | 54.9 | **94.3** | **62.1** |
| Light-R1-32B | 79k | 73.0 | 64.3 | 93.3 | 60.6 |
| OpenThinker-32B | 114k | 66.0 | 53.3 | 89.4 | 57.6 |
| **SAND-Math-Qwen2.5-32B (Ours)** | **14k** | **74.01** | **68.18** | 92.05 | 60.8 |
---
## ⚙️ The Synthetic Data Pipeline
Our results are powered by a 4-stage automated pipeline running on AMD hardware that prioritizes **difficulty and novelty** over volume. Unlike datasets that recycle easy problems, our pipeline leverages a Teacher Model (`GPT-OSS120b`) to generate, validate, and systematically "hike" the difficulty of reasoning problems.

### Pipeline Stages
1. **Stage 1: QA Generation & Consistency** 🛠️
- Generates novel problems from scratch
- Enforces correctness by requiring the teacher to generate multiple independent solution paths
- Only questions where all answers align are kept
2. **Stage 2: De-duplication & Decontamination** 🧹
- Removes internal duplicates via embedding similarity
- **Crucial Step:** Scans against known test sets (AIME, MATH, GPQA) to ensure zero contamination
3. **Stage 3: Difficulty Hiking** 🏔️
- Moderately challenging questions are rewritten by the teacher model
- Introduces deeper reasoning chains, added constraints, or cross-domain logic
- Systematically elevates complexity
- Configurable step primarily used when initial generation yields insufficient volume of high-difficulty samples
---
## 🚀 Quick Start
### Python Inference (Transformers)
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "amd/SAND-Math-Qwen2.5-32B"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Example prompt
prompt = "A bat and a ball cost $1.10 in total. The bat costs $1.00 more than the ball. How much does the ball cost?"
messages = [
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=4096,
temperature=0.7, # Recommended temperature
do_sample=True
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print("Response:", response)
```
### Serving (vLLM & SGLang)
You can easily serve this model as an OpenAI-compatible API endpoint.
**Using SGLang:**
```bash
python -m sglang.launch_server --model-path amd/SAND-Math-Qwen2.5-32B --max-model-len 32768
```
**Using vLLM:**
```bash
vllm serve amd/SAND-Math-Qwen2.5-32B --max-model-len 32768
```
---
## 💡 Usage Recommendations
To replicate our performance benchmarks and achieve the best reasoning results, we strongly recommend the following configurations:
* **Temperature:** Set `temperature=0.7`. **DO NOT use greedy decoding**, as it can lead to performance degradation and repetitive loops.
* **Prompting:** For mathematical problems, include a directive to enforce structure:
> "Please reason step by step, and put your final answer within \boxed{}."
* **Context Length:** We recommend allowing an output length of **32,768 tokens**. This ensures the model has sufficient space for long Chain-of-Thought (CoT) generation.
* **Thinking Token:** It is recommended to enforce the model to initiate its response with the `<think>\n` token to trigger the reasoning mode effectively.
* **Evaluation:** When benchmarking, conduct multiple passes (Pass@K) and average the results for stability.
---
## 📜 License
This project is licensed under the **Open RAIL-MSD** license. This is an open, royalty-free license that permits commercial use, modification, and distribution of the dataset, models, and source code.
The license includes standard use-based restrictions to prevent harmful applications (e.g., illegal activities, generating harmful content, high-risk applications). These restrictions are designed to promote responsible AI development while keeping the license permissive for legitimate use cases.
For full license terms and conditions, please see the [LICENSE](./LICENSE) file.
---
## Citation
If you use this model, dataset, or pipeline in your research, please cite our work:
```bibtex
@misc{manem025sandmathusingllmsgenerate,
title={SAND-Math: Using LLMs to Generate Novel, Difficult and Useful Mathematics Questions and Answers},
author={Chaitanya Manem and Pratik Prabhanjan Brahma and Prakamya Mishra and Zicheng Liu and Emad Barsoum},
year={2025},
eprint={2507.20527},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2507.20527},
}
```
|