Command-R 35B — CPT + SFT

Model type: Causal Language Model
Base model: ubitech-edg/qwen2.5-72b-cpt (which builds on Qwen/Qwen2.5-72B)
License: Apache 2.0
Framework: Axolotl + DeepSpeed ZeRO-1

Overview

qwen2.5-72b-cpt-sft is a two-stage trained version of Qwen 2.5-72B, combining continual pretraining (CPT) and supervised fine-tuning (SFT) using LoRA adapters in 4-bit NF4 quantization for efficient adaptation. This release contains only the LoRA adapters for the SFT stage and training configuration, allowing users to load them on top of the CPT adapters (which load on the official Qwen 2.5-72B base model). The CPT stage enhances domain knowledge, while the SFT stage refines question-answering and conversational skills using synthetic QA data.

Training was performed on the Leonardo EuroHPC supercomputer using Axolotl 0.6 + DeepSpeed ZeRO-1 optimization with bfloat16 computation.

Training Setup

Stage 1 (CPT): Domain-adaptive continual pretraining
Stage 2 (SFT): Instruction fine-tuning
Adapter type: LoRA
Quantization: 4-bit NF4 (bnb) Precision: bfloat16
Hardware: 8 nodes × 2 × NVIDIA A100 64GB GPUs
Framework: DeepSpeed ZeRO-1, Axolotl, PyTorch 2.5.1+cu121

Datasets

CPT Stage:

arxiv.jsonl
gov.jsonl
news.jsonl
wiki.jsonl

SFT Stage:

axolotl_deduplicated_synthetic_qa.jsonl

Hyperparameters

Parameter	Value
Sequence length	2048
Micro batch size	1
Gradient accumulation	4
Epochs	1
Learning rate	0.0001
LR scheduler	cosine
Optimizer	AdamW (8-bit)
Warmup steps	20
Weight decay	0.0
LoRA rank (r)	16
LoRA alpha	32
LoRA dropout	0.05
LoRA target modules	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Gradient checkpointing	✅
Flash attention	✅
Auto resume	✅
bnb 4-bit compute dtype	bfloat16
bnb 4-bit quant type	nf4
bnb double quant	true
Validation set size	0.3
Evals per epoch	10

Tokenizer

Tokenizer type: AutoTokenizer
Special token: <|end_of_text|> as pad_token

Files Included

This repository hosts LoRA adapters and Axolotl metadata only.

Contents:

adapter_config.json
adapter_model.safetensors
config.json
special_tokens_map.json
tokenizer_config.json
tokenizer.json
README.md

Usage — Load and Apply the Adapters

To use this CPT + SFT variant in Python (chain CPT then SFT adapters):

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model = "Qwen/Qwen2.5-72B"
cpt_adapter = "ubitech-edg/qwen2.5-72b-cpt"
sft_adapter = "ubitech-edg/qwen2.5-72b-cpt-sft"

# Load base and tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model)
model = AutoModelForCausalLM.from_pretrained(
    base_model, device_map="auto", torch_dtype="bfloat16"
)

# Load CPT LoRA adapters
model = PeftModel.from_pretrained(model, cpt_adapter)

# Load SFT LoRA adapters
model = PeftModel.from_pretrained(model, sft_adapter)
model.eval()

prompt = "What is the role of AI in renewable energy optimization?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Downloads last month: 31

Model tree for ubitech-edg/qwen2.5-72b-cpt-sft

Base model

Qwen/Qwen2.5-72B

Adapter

ubitech-edg/qwen2.5-72b-cpt