MathLlama 3.2 - Enhanced Mathematical Reasoning Model

Model Overview

MathLlama 3.2 is a fine-tuned version of Meta's Llama-3.2-3B-Instruct model, specifically enhanced for advanced mathematical reasoning tasks. This model demonstrates significant improvements in mathematical problem-solving capabilities through targeted fine-tuning with synthetic Chain-of-Thought (CoT) data.

Key Improvements

Mathematical Reasoning Enhancement

  • 12% improvement on abstract_algebra in MMLU compared to the base Llama-3.2-3B-Instruct model
  • Enhanced capability in complex mathematical reasoning tasks
  • Improved performance across various mathematical domains including algebra, calculus, and abstract mathematical concepts

Training Methodology

  • Synthetic Dataset: Utilized the Gemini-MMLU-CoT dataset consisting of 7,000 advanced math problems
  • Chain-of-Thought Training: Each training example includes detailed step-by-step reasoning processes

Model Architecture

  • Base Model: Meta Llama-3.2-3B-Instruct
  • Parameters: 3 billion parameters
  • Context Length: 128k tokens
  • Architecture: Transformer-based with optimized attention mechanisms

Usage

Basic Usage


# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("HenryShan/MathLlama3.2")
model = AutoModelForCausalLM.from_pretrained("HenryShan/MathLlama3.2")
messages = [
    {"role": "user", "content": "Find the degree for the given field extension Q(sqrt(2), sqrt(3), sqrt(18)) over Q. Answer Choices: A: "0", B: "4", C: "2", D: "6"},
]
inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    tokenize=True,
    return_dict=True,
    return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Training Details

Dataset Creation

  • Synthetic Dataset: 7,000 carefully designed advanced mathematical problems
  • Chain-of-Thought Format: Each row includes:
    1. Clear problem statement
    2. Step-by-step reasoning process
    3. Final answer with justification

Fine-tuning Process

  • Learning Rate: Optimized for mathematical reasoning tasks
  • Batch Size: Configured for stable training with mathematical data
  • Training Steps: Sufficient iterations to achieve mathematical reasoning improvements
  • Hardware: Trained on an Apple M4 Max Computer using MLX

Training Performance

The model was trained for 50 epochs, and its performance was tracked using Weights & Biases (wandb). The graph below shows the training loss and validation loss (val_loss) throughout the fine-tuning process.

Screenshot 2025-10-16 at 8.45.10 PM

Applications

This model is particularly well-suited for:

  • Mathematical tutoring systems
  • Automated theorem proving assistance
  • Advanced mathematical problem solving
  • STEM education applications
  • Research assistance in mathematical domains

Limitations

While MathLlama 3.2 shows significant improvements in mathematical reasoning, it:

  • Should not be used for safety-critical mathematical applications without verification
  • May occasionally make reasoning errors on extremely complex problems
  • Performance may vary across different mathematical domains

Ethical Considerations

This model is designed to enhance mathematical education and research. Users should:

  • Verify critical mathematical results independently
  • Use the model as a tool to augment, not replace, human mathematical reasoning
  • Be aware of potential biases in synthetic training data

Citation

@misc{mathllama2025,
      title={MathLlama 3.2: Enhanced Mathematical Reasoning with Synthetic CoT Data}, 
      author={Haotian Shan},
      year={2025},
      note={Fine-tuned Llama-3.2 model for advanced mathematical reasoning}
}

Downloads last month
80
Safetensors
Model size
3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for HenryShan/MathLlama3.2

Finetuned
(674)
this model
Quantizations
1 model

Dataset used to train HenryShan/MathLlama3.2