qwen3-1.7b-18432-r16-a32-lr1e4-ep3-bs4x2-adamw8bit-no-think - GGUF

This model was finetuned and converted to GGUF format using Unsloth.

Example usage:

For text only LLMs: llama-cli --hf repo_id/model_name -p "why is the sky blue?"
For multimodal models: llama-mtmd-cli -m model_name.gguf --mmproj mmproj_file.gguf

Available Model files:

qwen3-1.7b.Q4_K_M.gguf

Ollama

An Ollama Modelfile is included for easy deployment.

Dataset

https://huggingface.co/datasets/tisu1902/var-full-no-think

Wandb

https://wandb.ai/quangphamm1902/huggingface/runs/4bv2nr1q?nw=nwuserquangphamm1902

from trl import SFTTrainer, SFTConfig

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    args=SFTConfig(
        dataset_text_field="text",
        max_seq_length=18432,
        
        # Reduce effective batch for more steps
        per_device_train_batch_size=4,  # Down from 6
        gradient_accumulation_steps=2,  # Down from 6
        # Effective batch = 8 (instead of 36)
        # New steps per epoch = 410 / 8 ≈ 51
        # Total steps = 51 × 3 = 153 steps
        per_device_eval_batch_size=1,
        
        # Training
        num_train_epochs=3,
        learning_rate=1e-4,
        warmup_ratio=0.1,
        lr_scheduler_type="cosine",
        
        # Optimization
        optim="adamw_8bit",
        weight_decay=0.01,
        max_grad_norm=0.3,
        
        # Memory
        fp16=not torch.cuda.is_bf16_supported(),
        bf16=torch.cuda.is_bf16_supported(),
        gradient_checkpointing=True,
        
        # Evaluation & Saving - ADJUSTED FOR FEWER STEPS
        eval_strategy="steps",
        eval_steps=25,
        save_strategy="steps",
        save_steps=25,
        save_total_limit=3,
        load_best_model_at_end=True,
        metric_for_best_model="eval_loss",
        greater_is_better=False,
        
        # Logging
        logging_steps=5,
        logging_first_step=True,
        
        # Other
        seed=3407,
        output_dir="outputs",
        remove_unused_columns=False,
        
        # TRL specific
        dataset_num_proc=4,
        packing=False,
    ),
)

Downloads last month: 22

GGUF

Model size

2B params

Architecture

qwen3

Hardware compatibility

4-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support