MathLlama 3.2 - Enhanced Mathematical Reasoning Model

Model Overview

MathLlama 3.2 is a fine-tuned version of Meta's Llama-3.2-3B-Instruct model, specifically enhanced for advanced mathematical reasoning tasks. This model demonstrates significant improvements in mathematical problem-solving capabilities through targeted fine-tuning with synthetic Chain-of-Thought (CoT) data.

Key Improvements

Mathematical Reasoning Enhancement

12% improvement on abstract_algebra in MMLU compared to the base Llama-3.2-3B-Instruct model
Enhanced capability in complex mathematical reasoning tasks
Improved performance across various mathematical domains including algebra, calculus, and abstract mathematical concepts

Training Methodology

Synthetic Dataset: Utilized the Gemini-MMLU-CoT dataset consisting of 7,000 advanced math problems
Chain-of-Thought Training: Each training example includes detailed step-by-step reasoning processes

Model Architecture

Base Model: Meta Llama-3.2-3B-Instruct
Parameters: 3 billion parameters
Context Length: 128k tokens
Architecture: Transformer-based with optimized attention mechanisms

Usage

Basic Usage


# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("HenryShan/MathLlama3.2")
model = AutoModelForCausalLM.from_pretrained("HenryShan/MathLlama3.2")
messages = [
    {"role": "user", "content": "Find the degree for the given field extension Q(sqrt(2), sqrt(3), sqrt(18)) over Q. Answer Choices: A: "0", B: "4", C: "2", D: "6"},
]
inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    tokenize=True,
    return_dict=True,
    return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Training Details

Dataset Creation

Synthetic Dataset: 7,000 carefully designed advanced mathematical problems
Chain-of-Thought Format: Each row includes:
1. Clear problem statement
2. Step-by-step reasoning process
3. Final answer with justification

Fine-tuning Process

Learning Rate: Optimized for mathematical reasoning tasks
Batch Size: Configured for stable training with mathematical data
Training Steps: Sufficient iterations to achieve mathematical reasoning improvements
Hardware: Trained on an Apple M4 Max Computer using MLX

Training Performance

The model was trained for 50 epochs, and its performance was tracked using Weights & Biases (wandb). The graph below shows the training loss and validation loss (val_loss) throughout the fine-tuning process.

Applications

This model is particularly well-suited for:

Mathematical tutoring systems
Automated theorem proving assistance
Advanced mathematical problem solving
STEM education applications
Research assistance in mathematical domains

Limitations

While MathLlama 3.2 shows significant improvements in mathematical reasoning, it:

Should not be used for safety-critical mathematical applications without verification
May occasionally make reasoning errors on extremely complex problems
Performance may vary across different mathematical domains

Ethical Considerations

This model is designed to enhance mathematical education and research. Users should:

Verify critical mathematical results independently
Use the model as a tool to augment, not replace, human mathematical reasoning
Be aware of potential biases in synthetic training data

Citation

@misc{mathllama2025,
      title={MathLlama 3.2: Enhanced Mathematical Reasoning with Synthetic CoT Data}, 
      author={Haotian Shan},
      year={2025},
      note={Fine-tuned Llama-3.2 model for advanced mathematical reasoning}
}

Downloads last month: 80

Safetensors

Model size

3B params

Tensor type

BF16

Model tree for HenryShan/MathLlama3.2

Base model

meta-llama/Llama-3.2-3B-Instruct

Finetuned

(674)

this model

Quantizations

1 model

HenryShan
/

MathLlama3.2