Prakamya commited on
Commit
ae06222
·
verified ·
1 Parent(s): 137b674

Upload folder using huggingface_hub

Browse files
Files changed (3) hide show
  1. .gitattributes +1 -0
  2. README.md +6 -6
  3. SAND-MATH-Blog.png +3 -0
.gitattributes CHANGED
@@ -4,4 +4,5 @@
4
  *.pth filter=lfs diff=lfs merge=lfs -text
5
 
6
  PipelineSimple.png filter=lfs diff=lfs merge=lfs -text
 
7
  tokenizer.json filter=lfs diff=lfs merge=lfs -text
 
4
  *.pth filter=lfs diff=lfs merge=lfs -text
5
 
6
  PipelineSimple.png filter=lfs diff=lfs merge=lfs -text
7
+ SAND-MATH-Blog.png filter=lfs diff=lfs merge=lfs -text
8
  tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -12,7 +12,7 @@ base_model:
12
  - Qwen/Qwen2.5-32B-Instruct
13
  ---
14
 
15
- # SAND-Reasoning: Best-in-class Large Reasoning Model Built with Synthetic Data only using AMD GPUs
16
 
17
  <div align="center">
18
 
@@ -22,7 +22,7 @@ base_model:
22
 
23
  ## Model Summary
24
 
25
- We introduce **SAND-Math-Qwen2.5-32B** and **SAND-MathScience-DeepSeek-Qwen32B**, reasoning models built entirely using a synthetic data pipeline running on the **AMD ROCm™ stack** and **AMD Instinct™ MI325 GPUs**.
26
 
27
  By prioritizing data difficulty along with quantity, we demonstrate that high-difficulty synthetic data can elevate prior-generation models to match or exceed modern proprietary models. `SAND-Math-Qwen2.5-32B` is fine-tuned from **Qwen2.5-32B-Instruct** on just **14k synthetic math samples**, achieving strong reasoning capabilities with minimal data outperforming other data distillation and post training approaches. `SAND-MathScience-DeepSeek-Qwen32B` is fine-tuned from **DeepSeek-R1-Distill-Qwen-32B** on a compact dataset of **27k samples** (15k Math + 12k Science), achieving a generational leap in performance that rivals **Qwen3-32B**.
28
 
@@ -48,10 +48,10 @@ Using only **14k synthetic math samples** and standard SFT (no RL), our approach
48
  | Model | Data Size | AIME24 | AIME25 | MATH500 | GPQA |
49
  | :--- | :--- | :---: | :---: | :---: | :---: |
50
  | Qwen2.5-32B-Instruct (Base) | - | 16.7 | 13.3 | 83.4 | 53.5 |
51
- | DeepSeek-R1-Distill-Qwen-32B | 800k | 72.6 | 54.9 | 94.3 | 62.1 |
52
  | Light-R1-32B | 79k | 73.0 | 64.3 | 93.3 | 60.6 |
53
  | OpenThinker-32B | 114k | 66.0 | 53.3 | 89.4 | 57.6 |
54
- | **SAND-Math-Qwen2.5-32B (Ours)** | **14k** | **74.01** | **68.18** | **92.05** | **60.8** |
55
 
56
  ---
57
 
@@ -59,7 +59,7 @@ Using only **14k synthetic math samples** and standard SFT (no RL), our approach
59
 
60
  Our results are powered by a 4-stage automated pipeline running on AMD hardware that prioritizes **difficulty and novelty** over volume. Unlike datasets that recycle easy problems, our pipeline leverages a Teacher Model (`GPT-OSS120b`) to generate, validate, and systematically "hike" the difficulty of reasoning problems.
61
 
62
- ![Pipeline Overview](PipelineSimple.png)
63
 
64
  ### Pipeline Stages
65
 
@@ -97,7 +97,7 @@ model = AutoModelForCausalLM.from_pretrained(
97
  tokenizer = AutoTokenizer.from_pretrained(model_name)
98
 
99
  # Example prompt
100
- prompt = "Find the number of pairs of positive integers $(m, n)$ such that $m^2 + n < 22$ and $n^2 + m < 22$."
101
  messages = [
102
  {"role": "user", "content": prompt}
103
  ]
 
12
  - Qwen/Qwen2.5-32B-Instruct
13
  ---
14
 
15
+ # State-of-the-art Large Reasoning Model Built Using Only Synthetic Data on AMD GPUs
16
 
17
  <div align="center">
18
 
 
22
 
23
  ## Model Summary
24
 
25
+ We introduce **SAND-Math-Qwen2.5-32B** and **SAND-MathScience-DeepSeek-Qwen32B**, state-of-the-art reasoning models in the 32B parameter range, built entirely using a synthetic data pipeline running on the **AMD ROCm™ stack** and **AMD Instinct™ MI325 GPUs**.
26
 
27
  By prioritizing data difficulty along with quantity, we demonstrate that high-difficulty synthetic data can elevate prior-generation models to match or exceed modern proprietary models. `SAND-Math-Qwen2.5-32B` is fine-tuned from **Qwen2.5-32B-Instruct** on just **14k synthetic math samples**, achieving strong reasoning capabilities with minimal data outperforming other data distillation and post training approaches. `SAND-MathScience-DeepSeek-Qwen32B` is fine-tuned from **DeepSeek-R1-Distill-Qwen-32B** on a compact dataset of **27k samples** (15k Math + 12k Science), achieving a generational leap in performance that rivals **Qwen3-32B**.
28
 
 
48
  | Model | Data Size | AIME24 | AIME25 | MATH500 | GPQA |
49
  | :--- | :--- | :---: | :---: | :---: | :---: |
50
  | Qwen2.5-32B-Instruct (Base) | - | 16.7 | 13.3 | 83.4 | 53.5 |
51
+ | DeepSeek-R1-Distill-Qwen-32B | 800k | 72.6 | 54.9 | **94.3** | **62.1** |
52
  | Light-R1-32B | 79k | 73.0 | 64.3 | 93.3 | 60.6 |
53
  | OpenThinker-32B | 114k | 66.0 | 53.3 | 89.4 | 57.6 |
54
+ | **SAND-Math-Qwen2.5-32B (Ours)** | **14k** | **74.01** | **68.18** | 92.05 | 60.8 |
55
 
56
  ---
57
 
 
59
 
60
  Our results are powered by a 4-stage automated pipeline running on AMD hardware that prioritizes **difficulty and novelty** over volume. Unlike datasets that recycle easy problems, our pipeline leverages a Teacher Model (`GPT-OSS120b`) to generate, validate, and systematically "hike" the difficulty of reasoning problems.
61
 
62
+ ![Pipeline Overview](SAND-MATH-Blog.png)
63
 
64
  ### Pipeline Stages
65
 
 
97
  tokenizer = AutoTokenizer.from_pretrained(model_name)
98
 
99
  # Example prompt
100
+ prompt = "A bat and a ball cost $1.10 in total. The bat costs $1.00 more than the ball. How much does the ball cost?"
101
  messages = [
102
  {"role": "user", "content": prompt}
103
  ]
SAND-MATH-Blog.png ADDED

Git LFS Details

  • SHA256: f1841bc904fb83d026983cf6fb1eae6188dcd10d1ddd9d4e4c295eab7458640f
  • Pointer size: 132 Bytes
  • Size of remote file: 1.31 MB