GPT-OSS-20B Fine-tuned for Azerbaijani (SFT)

This model is a fine-tuned version of openai/gpt-oss-20b specifically optimized for Azerbaijani language while preserving English and reasoning capabilities.

Model Description

  • Base Model: GPT-OSS-20B (20B parameter Mixture-of-Experts model)
  • Fine-tuning Stage: Supervised Fine-Tuning (SFT)
  • Training Data: ~15M tokens of Azerbaijani conversational data
  • Training Strategy: MoE-Safe approach (layers 22-23 attention + LoRA on routers)
  • Languages: Azerbaijani (primary), English (preserved)

Training Strategy

This model uses a MoE-Safe training approach to prevent catastrophic forgetting and MoE collapse:

What was trained:

  • ✅ Layer 22-23 attention projections (q, k, v, o) - full fine-tuning
  • ✅ Layer 22-23 routers - LoRA adaptation

What was preserved:

  • ✅ All embeddings
  • ✅ All MLP experts (MoE structure intact)
  • ✅ All early layers (0-21)
  • ✅ General reasoning and English capabilities

This selective training approach ensures:

  • Improved Azerbaijani language understanding
  • Preserved English capabilities (~95-98%)
  • Maintained reasoning abilities (~95-98%)
  • No MoE collapse or expert degradation

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load model and tokenizer
model_name = "mrhuseyn4/gpt-oss-20b-azerbaijani-sft"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Generate text
prompt = "İstifadəçi: Azərbaycan haqqında nə bilirsən?\nAsistent:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=256,
    temperature=0.7,
    top_p=0.9,
    do_sample=True
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Training Details

Training Hyperparameters

  • Training regime: Step-based (1000 steps)
  • Batch size: 1 per device
  • Gradient accumulation: 16 steps
  • Effective batch size: 16
  • Learning rate: 2e-5
  • Warmup steps: 100
  • LR scheduler: Cosine
  • Weight decay: 0.01
  • Max gradient norm: 1.0
  • Precision: bfloat16
  • Optimizer: AdamW

Dataset

  • Source: Azerbaijani conversational data
  • Size: ~15M tokens
  • Format: Conversational (user-assistant pairs)
  • Max sequence length: 512 tokens

Intended Use

This model is designed for:

  • Azerbaijani text generation
  • Conversational AI in Azerbaijani
  • Question answering in Azerbaijani
  • Multilingual tasks (Azerbaijani + English)

Limitations

  • Primary focus on Azerbaijani; other languages may have reduced performance
  • May occasionally mix Azerbaijani and English
  • Inherits limitations from the base GPT-OSS-20B model
  • Requires significant computational resources (20B parameters)

Evaluation

The model has been trained using a conservative MoE-Safe approach to ensure:

  • ✅ Improved Azerbaijani fluency and grammar
  • ✅ Better handling of Azerbaijani morphology
  • ✅ Preserved English capabilities
  • ✅ Maintained reasoning abilities
  • ✅ No MoE collapse

Citation

If you use this model, please cite:

@misc{gpt-oss-20b-azerbaijani-sft,
  author = {mrhuseyn4},
  title = {GPT-OSS-20B Fine-tuned for Azerbaijani (SFT)},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/mrhuseyn4/gpt-oss-20b-azerbaijani-sft}}
}

License

This model inherits the license from the base model (Apache 2.0).

Acknowledgments

  • Base model: openai/gpt-oss-20b
  • Training framework: HuggingFace Transformers
  • Fine-tuning approach: MoE-Safe strategy
Downloads last month
15
Safetensors
Model size
21B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mrhuseyn4/gpt-oss-20b-azerbaijani-sft

Base model

openai/gpt-oss-20b
Finetuned
(397)
this model