MathLlama 3.2 - Enhanced Mathematical Reasoning Model
Model Overview
MathLlama 3.2 is a fine-tuned version of Meta's Llama-3.2-3B-Instruct model, specifically enhanced for advanced mathematical reasoning tasks. This model demonstrates significant improvements in mathematical problem-solving capabilities through targeted fine-tuning with synthetic Chain-of-Thought (CoT) data.
Key Improvements
Mathematical Reasoning Enhancement
- 12% improvement on abstract_algebra in MMLU compared to the base Llama-3.2-3B-Instruct model
- Enhanced capability in complex mathematical reasoning tasks
- Improved performance across various mathematical domains including algebra, calculus, and abstract mathematical concepts
Training Methodology
- Synthetic Dataset: Utilized the Gemini-MMLU-CoT dataset consisting of 7,000 advanced math problems
- Chain-of-Thought Training: Each training example includes detailed step-by-step reasoning processes
Model Architecture
- Base Model: Meta Llama-3.2-3B-Instruct
- Parameters: 3 billion parameters
- Context Length: 128k tokens
- Architecture: Transformer-based with optimized attention mechanisms
Usage
Basic Usage
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("HenryShan/MathLlama3.2")
model = AutoModelForCausalLM.from_pretrained("HenryShan/MathLlama3.2")
messages = [
{"role": "user", "content": "Find the degree for the given field extension Q(sqrt(2), sqrt(3), sqrt(18)) over Q. Answer Choices: A: "0", B: "4", C: "2", D: "6"},
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))
Training Details
Dataset Creation
- Synthetic Dataset: 7,000 carefully designed advanced mathematical problems
- Chain-of-Thought Format: Each row includes:
- Clear problem statement
- Step-by-step reasoning process
- Final answer with justification
Fine-tuning Process
- Learning Rate: Optimized for mathematical reasoning tasks
- Batch Size: Configured for stable training with mathematical data
- Training Steps: Sufficient iterations to achieve mathematical reasoning improvements
- Hardware: Trained on an Apple M4 Max Computer using MLX
Training Performance
The model was trained for 50 epochs, and its performance was tracked using Weights & Biases (wandb). The graph below shows the training loss and validation loss (val_loss) throughout the fine-tuning process.
Applications
This model is particularly well-suited for:
- Mathematical tutoring systems
- Automated theorem proving assistance
- Advanced mathematical problem solving
- STEM education applications
- Research assistance in mathematical domains
Limitations
While MathLlama 3.2 shows significant improvements in mathematical reasoning, it:
- Should not be used for safety-critical mathematical applications without verification
- May occasionally make reasoning errors on extremely complex problems
- Performance may vary across different mathematical domains
Ethical Considerations
This model is designed to enhance mathematical education and research. Users should:
- Verify critical mathematical results independently
- Use the model as a tool to augment, not replace, human mathematical reasoning
- Be aware of potential biases in synthetic training data
Citation
@misc{mathllama2025,
title={MathLlama 3.2: Enhanced Mathematical Reasoning with Synthetic CoT Data},
author={Haotian Shan},
year={2025},
note={Fine-tuned Llama-3.2 model for advanced mathematical reasoning}
}
- Downloads last month
- 80
