--- library_name: peft base_model: Qwen/Qwen2.5-7B-Instruct tags: - lora - adapter --- # LoRA Adapter - Checkpoint 200 LoRA adapter fine-tuned from Qwen/Qwen2.5-7B-Instruct (checkpoint 200) ## Quick Start with Quantization ```bash pip install torch transformers peft accelerate bitsandbytes ``` ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig from peft import PeftModel, prepare_model_for_kbit_training from accelerate import PartialState # Configure 4-bit quantization for memory efficiency bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.float16, bnb_4bit_use_double_quant=False, ) # Load tokenizer tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-7B-Instruct", use_fast=False) if tokenizer.pad_token is None: tokenizer.pad_token = tokenizer.eos_token # Load base model with quantization base_model = AutoModelForCausalLM.from_pretrained( "Qwen/Qwen2.5-7B-Instruct", quantization_config=bnb_config, device_map={"": PartialState().process_index}, torch_dtype=torch.float16, ) # Prepare for LoRA base_model = prepare_model_for_kbit_training(base_model) # Load LoRA adapter model = PeftModel.from_pretrained(base_model, "YOUR_USERNAME/YOUR_REPO_NAME") print("✅ Model loaded with 4-bit quantization!") ``` ## Interactive Chat ```python def generate_text(model, tokenizer, prompt, max_new_tokens=500, temperature=0.7): inputs = tokenizer(prompt, return_tensors="pt").to(model.device) with torch.no_grad(): outputs = model.generate( **inputs, max_new_tokens=max_new_tokens, temperature=temperature, do_sample=True, pad_token_id=tokenizer.eos_token_id, ) return tokenizer.decode( outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True ) # Interactive mode print("🤖 Interactive Chat (type 'quit' to exit)") while True: prompt = input("\nPrompt: ").strip() if prompt.lower() in ['quit', 'exit', 'q']: break if prompt: response = generate_text(model, tokenizer, prompt) print(f"Response: {response}") ``` ## Memory Requirements - **4-bit quantization**: ~4GB VRAM (7B model) - **8-bit quantization**: ~7GB VRAM (7B model) - **No quantization**: ~14GB VRAM (7B model) ## Training Details - Base model: Qwen/Qwen2.5-7B-Instruct - Training framework: LoRA (Low-Rank Adaptation) - Checkpoint: 200