Command-R 35B β€” CPT + SFT

Model type: Causal Language Model
Base model: ubitech-edg/qwen2.5-72b-cpt (which builds on Qwen/Qwen2.5-72B)
License: Apache 2.0
Framework: Axolotl + DeepSpeed ZeRO-1


Overview

qwen2.5-72b-cpt-sft is a two-stage trained version of Qwen 2.5-72B, combining continual pretraining (CPT) and supervised fine-tuning (SFT) using LoRA adapters in 4-bit NF4 quantization for efficient adaptation. This release contains only the LoRA adapters for the SFT stage and training configuration, allowing users to load them on top of the CPT adapters (which load on the official Qwen 2.5-72B base model). The CPT stage enhances domain knowledge, while the SFT stage refines question-answering and conversational skills using synthetic QA data.

Training was performed on the Leonardo EuroHPC supercomputer using Axolotl 0.6 + DeepSpeed ZeRO-1 optimization with bfloat16 computation.


Training Setup

Stage 1 (CPT): Domain-adaptive continual pretraining
Stage 2 (SFT): Instruction fine-tuning
Adapter type: LoRA
Quantization: 4-bit NF4 (bnb) Precision: bfloat16
Hardware: 8 nodes Γ— 2 Γ— NVIDIA A100 64GB GPUs
Framework: DeepSpeed ZeRO-1, Axolotl, PyTorch 2.5.1+cu121


Datasets

CPT Stage:

  • arxiv.jsonl
  • gov.jsonl
  • news.jsonl
  • wiki.jsonl

SFT Stage:

  • axolotl_deduplicated_synthetic_qa.jsonl

Hyperparameters

Parameter Value
Sequence length 2048
Micro batch size 1
Gradient accumulation 4
Epochs 1
Learning rate 0.0001
LR scheduler cosine
Optimizer AdamW (8-bit)
Warmup steps 20
Weight decay 0.0
LoRA rank (r) 16
LoRA alpha 32
LoRA dropout 0.05
LoRA target modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Gradient checkpointing βœ…
Flash attention βœ…
Auto resume βœ…
bnb 4-bit compute dtype bfloat16
bnb 4-bit quant type nf4
bnb double quant true
Validation set size 0.3
Evals per epoch 10

Tokenizer

Tokenizer type: AutoTokenizer
Special token: <|end_of_text|> as pad_token


Files Included

This repository hosts LoRA adapters and Axolotl metadata only.

Contents:

  • adapter_config.json
  • adapter_model.safetensors
  • config.json
  • special_tokens_map.json
  • tokenizer_config.json
  • tokenizer.json
  • README.md

Usage β€” Load and Apply the Adapters

To use this CPT + SFT variant in Python (chain CPT then SFT adapters):

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model = "Qwen/Qwen2.5-72B"
cpt_adapter = "ubitech-edg/qwen2.5-72b-cpt"
sft_adapter = "ubitech-edg/qwen2.5-72b-cpt-sft"

# Load base and tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model)
model = AutoModelForCausalLM.from_pretrained(
    base_model, device_map="auto", torch_dtype="bfloat16"
)

# Load CPT LoRA adapters
model = PeftModel.from_pretrained(model, cpt_adapter)

# Load SFT LoRA adapters
model = PeftModel.from_pretrained(model, sft_adapter)
model.eval()

prompt = "What is the role of AI in renewable energy optimization?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Downloads last month
31
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for ubitech-edg/qwen2.5-72b-cpt-sft

Base model

Qwen/Qwen2.5-72B
Adapter
(1)
this model

Dataset used to train ubitech-edg/qwen2.5-72b-cpt-sft