Smilyai-labs-community
/

Qwen-distill

Text Generation

Model card Files Files and versions

🧠 Qwen3 Distilled Student (0.6B GPT2-style)

A compact, CPU-friendly student distilled from Qwen3-0.6B, optimized for lightweight deployment and real-time chat. Designed for use in browser, Colab, or mobile environments with limited resources.

🏗 Architecture

Based on GPT2Config schema for compatibility
Patches applied:
- n_inner, layer_norm_epsilon, activation_function, etc.
- Handles missing dropout attributes gracefully
Supports attention streaming and assistant-style prompting

🛠 Training Setup

Source: Qwen3-0.6B
Distillation: direct next-token distillation using custom prompt logic
Platform: Kaggle GPU (A100, 40GB)
Framework: TensorFlow / PyTorch hybrid flow, minimal dependencies

Downloads last month: -; Downloads are not tracked for this model. How to track