Model Card for faresfawzi/Qwen3-8B-SCRIBE

Abstract

Language models can be used to provide interactive, personalized student feedback in educational settings. However, real-world deployment faces three key challenges: privacy concerns, limited computational resources, and the need for pedagogically valid responses. These constraints require small, open-source models that can run locally and reliably ground their outputs in correct information. We introduce SCRIBE, a framework for multi-hop, tool-augmented reasoning designed to generate valid responses to student questions about feedback reports. SCRIBE combines domain-specific tools with a self-reflective inference pipeline that supports iterative reasoning, tool use, and error recovery. We distil these capabilities into 3B and 8B models via two-stage LoRA fine-tuning on synthetic GPT-4o-generated data. Evaluation with a human-aligned GPT-Judge and a user study with 108 students shows that 8B-SCRIBE models achieve comparable or superior quality to much larger models in key dimensions such as relevance and actionability, while being perceived on par with GPT-4o and Llama-3.3 70B by students. These findings demonstrate the viability of SCRIBE for low-resource, privacy-sensitive educational applications.


Model Description

Qwen3-8B-SCRIBE is a fine-tuned large language model for interactive educational feedback.
It builds on Qwen/Qwen3-8B and incorporates the SCRIBE framework: structured chain reasoning with multi-hop tool calling and self-reflection, enabling small models to deliver pedagogically valid, actionable, and context-grounded explanations to student questions.

  • Developed by: EPFL (Machine Learning for Education Lab)
  • Paper: SCRIBE: Structured Chain Reasoning for Interactive Behavior Explanations using Tool Calling
  • Authors: Fares Fawzi, Vinitra Swamy, Dominik Glandorf, Tanya Nazaretsky, Tanja Kรคser
  • Model type: Tool-augmented 8B LLM fine-tuned with two-stage LoRA
  • Languages: English
  • License: Apache 2.0
  • Finetuned from: Qwen/Qwen3-8B

Uses

Direct Use

The model is designed to:

  • Provide personalized, interactive feedback to students in MOOCs.
  • Generate pedagogically grounded explanations using multi-step reasoning with tool calls.
  • Support privacy-sensitive deployments by running locally.

Notes

  • Built on a function-calling base model, enabling robust integration with external tools/APIs.
  • Ideal for education-focused assistants that need both reasoning and grounded outputs.

Training Details

Training Data

  • Base data: Feedback reports from MOOCs (DSP, GEO, VA, LNV)
  • Synthetic data: ~7,000 student-like questions generated with GPT-4o, including reasoning traces, tool calls, and final responses
  • Real data: 75 student-authored questions annotated into pedagogical categories

Training Procedure

  • Two-stage LoRA fine-tuning:
    1. Stage 1: Initial reasoning + tool selection
    2. Stage 2: Multi-hop reasoning + final answer generation
  • Inference: Closed-loop tool-calling with self-reflection and error recovery

Training Hyperparameters

  • Regime: bf16 mixed precision
  • LoRA rank: 256 for 8B variant

Citation

If you use this model, please cite:

BibTeX

@inproceedings{2025-EMNLP-Scribe,
  author    = {Fares Fawzi and Vinitra Swamy and Dominik Glandorf and Tanya Nazaretsky and Tanja K{\"a}ser},
  booktitle = {Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP)},
  title     = {SCRIBE: Structured Chain Reasoning for Interactive Behavior Explanations using Tool Calling},
  year      = {2025}
}
Downloads last month
200
Safetensors
Model size
8B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for faresfawzi/Qwen3-8B-SCRIBE

Adapters
1 model