GPT-OSS-20B Fine-tuned for Azerbaijani (SFT)

This model is a fine-tuned version of openai/gpt-oss-20b specifically optimized for Azerbaijani language while preserving English and reasoning capabilities.

Model Description

Base Model: GPT-OSS-20B (20B parameter Mixture-of-Experts model)
Fine-tuning Stage: Supervised Fine-Tuning (SFT)
Training Data: ~15M tokens of Azerbaijani conversational data
Training Strategy: MoE-Safe approach (layers 22-23 attention + LoRA on routers)
Languages: Azerbaijani (primary), English (preserved)

Training Strategy

This model uses a MoE-Safe training approach to prevent catastrophic forgetting and MoE collapse:

What was trained:

✅ Layer 22-23 attention projections (q, k, v, o) - full fine-tuning
✅ Layer 22-23 routers - LoRA adaptation

What was preserved:

✅ All embeddings
✅ All MLP experts (MoE structure intact)
✅ All early layers (0-21)
✅ General reasoning and English capabilities

This selective training approach ensures:

Improved Azerbaijani language understanding
Preserved English capabilities (~95-98%)
Maintained reasoning abilities (~95-98%)
No MoE collapse or expert degradation

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load model and tokenizer
model_name = "mrhuseyn4/gpt-oss-20b-azerbaijani-sft"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Generate text
prompt = "İstifadəçi: Azərbaycan haqqında nə bilirsən?\nAsistent:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=256,
    temperature=0.7,
    top_p=0.9,
    do_sample=True
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Training Details

Training Hyperparameters

Training regime: Step-based (1000 steps)
Batch size: 1 per device
Gradient accumulation: 16 steps
Effective batch size: 16
Learning rate: 2e-5
Warmup steps: 100
LR scheduler: Cosine
Weight decay: 0.01
Max gradient norm: 1.0
Precision: bfloat16
Optimizer: AdamW

Dataset

Source: Azerbaijani conversational data
Size: ~15M tokens
Format: Conversational (user-assistant pairs)
Max sequence length: 512 tokens

Intended Use

This model is designed for:

Azerbaijani text generation
Conversational AI in Azerbaijani
Question answering in Azerbaijani
Multilingual tasks (Azerbaijani + English)

Limitations

Primary focus on Azerbaijani; other languages may have reduced performance
May occasionally mix Azerbaijani and English
Inherits limitations from the base GPT-OSS-20B model
Requires significant computational resources (20B parameters)

Evaluation

The model has been trained using a conservative MoE-Safe approach to ensure:

✅ Improved Azerbaijani fluency and grammar
✅ Better handling of Azerbaijani morphology
✅ Preserved English capabilities
✅ Maintained reasoning abilities
✅ No MoE collapse

Citation

If you use this model, please cite:

@misc{gpt-oss-20b-azerbaijani-sft,
  author = {mrhuseyn4},
  title = {GPT-OSS-20B Fine-tuned for Azerbaijani (SFT)},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/mrhuseyn4/gpt-oss-20b-azerbaijani-sft}}
}

License

This model inherits the license from the base model (Apache 2.0).

Acknowledgments

Base model: openai/gpt-oss-20b
Training framework: HuggingFace Transformers
Fine-tuning approach: MoE-Safe strategy

Downloads last month: 15

Safetensors

Model size

21B params

Tensor type

BF16

Model tree for mrhuseyn4/gpt-oss-20b-azerbaijani-sft

Base model

openai/gpt-oss-20b

Finetuned

(397)

this model