GPT-OSS-20B Fine-tuned for Azerbaijani (SFT)
This model is a fine-tuned version of openai/gpt-oss-20b specifically optimized for Azerbaijani language while preserving English and reasoning capabilities.
Model Description
- Base Model: GPT-OSS-20B (20B parameter Mixture-of-Experts model)
- Fine-tuning Stage: Supervised Fine-Tuning (SFT)
- Training Data: ~15M tokens of Azerbaijani conversational data
- Training Strategy: MoE-Safe approach (layers 22-23 attention + LoRA on routers)
- Languages: Azerbaijani (primary), English (preserved)
Training Strategy
This model uses a MoE-Safe training approach to prevent catastrophic forgetting and MoE collapse:
What was trained:
- ✅ Layer 22-23 attention projections (q, k, v, o) - full fine-tuning
- ✅ Layer 22-23 routers - LoRA adaptation
What was preserved:
- ✅ All embeddings
- ✅ All MLP experts (MoE structure intact)
- ✅ All early layers (0-21)
- ✅ General reasoning and English capabilities
This selective training approach ensures:
- Improved Azerbaijani language understanding
- Preserved English capabilities (~95-98%)
- Maintained reasoning abilities (~95-98%)
- No MoE collapse or expert degradation
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Load model and tokenizer
model_name = "mrhuseyn4/gpt-oss-20b-azerbaijani-sft"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_name,
trust_remote_code=True,
torch_dtype=torch.bfloat16,
device_map="auto"
)
# Generate text
prompt = "İstifadəçi: Azərbaycan haqqında nə bilirsən?\nAsistent:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=256,
temperature=0.7,
top_p=0.9,
do_sample=True
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Training Details
Training Hyperparameters
- Training regime: Step-based (1000 steps)
- Batch size: 1 per device
- Gradient accumulation: 16 steps
- Effective batch size: 16
- Learning rate: 2e-5
- Warmup steps: 100
- LR scheduler: Cosine
- Weight decay: 0.01
- Max gradient norm: 1.0
- Precision: bfloat16
- Optimizer: AdamW
Dataset
- Source: Azerbaijani conversational data
- Size: ~15M tokens
- Format: Conversational (user-assistant pairs)
- Max sequence length: 512 tokens
Intended Use
This model is designed for:
- Azerbaijani text generation
- Conversational AI in Azerbaijani
- Question answering in Azerbaijani
- Multilingual tasks (Azerbaijani + English)
Limitations
- Primary focus on Azerbaijani; other languages may have reduced performance
- May occasionally mix Azerbaijani and English
- Inherits limitations from the base GPT-OSS-20B model
- Requires significant computational resources (20B parameters)
Evaluation
The model has been trained using a conservative MoE-Safe approach to ensure:
- ✅ Improved Azerbaijani fluency and grammar
- ✅ Better handling of Azerbaijani morphology
- ✅ Preserved English capabilities
- ✅ Maintained reasoning abilities
- ✅ No MoE collapse
Citation
If you use this model, please cite:
@misc{gpt-oss-20b-azerbaijani-sft,
author = {mrhuseyn4},
title = {GPT-OSS-20B Fine-tuned for Azerbaijani (SFT)},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/mrhuseyn4/gpt-oss-20b-azerbaijani-sft}}
}
License
This model inherits the license from the base model (Apache 2.0).
Acknowledgments
- Base model: openai/gpt-oss-20b
- Training framework: HuggingFace Transformers
- Fine-tuning approach: MoE-Safe strategy
- Downloads last month
- 15
Model tree for mrhuseyn4/gpt-oss-20b-azerbaijani-sft
Base model
openai/gpt-oss-20b