Overview
Newstar-Qwen3-0.6B is a finetuned version of the Qwen3-0.6B base model. It uses Newstar’s instruction tuning on top of Qwen3’s pretrained weights. The tuning was done using the ITP-v2 dataset.
This model is designed without thinking capabilities. It intentionally avoids the reasoning mode that Qwen3 supports. Its primary purpose is to offer a Qwen3-based model that focuses on straightforward instruction following without engaging in complex reasoning.
Model Details
- Base model: Qwen3-0.6B (causal language model)
- Finetuning: Instruction tuning by Newstar on ITP-v2 dataset
- Parameters: 0.6 billion
- Layers: 28
- Attention heads: 16 Q, 8 KV (GQA)
- Context length: 32,768 tokens
Intended Use
- General instruction following without reasoning or "thinking"
- Simple and efficient dialogue or text generation tasks
- Scenarios where disabling complex logical or mathematical reasoning is preferred
Performance and Limitations
- Benchmarks: Not yet tested
- Informally observed to be less capable than Qwen3 Instruct in reasoning and complex tasks
- Lacks the advanced thinking mode of the original Qwen3 model
- Use when reasoning is not required or to avoid thinking mode overhead
How It Differs from Qwen3-0.6B
- Qwen3 supports both thinking and non-thinking modes
- Newstar-Qwen3-0.6B disables thinking mode completely
- Less capable in math, coding, and logic but simpler for basic instructions
Usage Notes
- Thinking mode (
enable_thinking=True) is disabled - Model does not generate
<think>...</think>reasoning blocks - Recommended generation settings (non-thinking mode):
- Temperature: 0.7
- Top-p: 0.95
- Top-k: 64
- Rep-penalty: 1.03
Citations
@misc{qwen3technicalreport,
title={Qwen3 Technical Report},
author={Qwen Team},
year={2025},
eprint={2505.09388},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2505.09388},
}
- Downloads last month
- 9