LoRA Adapters: TinyLlama-1.1B Quote Generator

This repository contains the LoRA (Low-Rank Adaptation) adapter weights for a version of TinyLlama/TinyLlama-1.1B-Chat-v1.0 fine-tuned to generate motivational quotes.

These are only the adapter weights, not the full model. You must load these adapters onto the base TinyLlama model to use them.

This model was trained in Google Colab on a T4 GPU using QLoRA. The training process specialized the model, resulting in a 2.4x inference speedup on the same GPU compared to the base model.

Base Model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
Dataset: Abirate/english_quotes

⚡ Quick Start (How to use)

This shows how to load the 4-bit quantized base model and merge these adapters for fast inference on a GPU.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline, BitsAndBytesConfig
from peft import PeftModel

base_model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
adapter_repo_name = "bkqz/tinyllama-quotes-adapters" # This repo

# 1. Load the 4-bit base model
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
)

base_model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
)

# 2. Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

# 3. Load the LoRA adapters from this repo
finetuned_model = PeftModel.from_pretrained(base_model, adapter_repo_name)
print("Base model and LoRA adapters loaded.")

# 4. Cast adapters to float16 to fix data type mismatch
finetuned_model.to(torch.float16)

# 5. Set up the generation pipeline
pipe = pipeline(
    "text-generation",
    model=finetuned_model,
    tokenizer=tokenizer,
    device_map="auto"
)

# 6. Generate a quote
prompt = "Keyword: life\nQuote:"

result = pipe(
    prompt,
    max_new_tokens=80,
    do_sample=True,
    temperature=0.7,
    top_p=0.9,
    eos_token_id=tokenizer.eos_token_id
)

print(result[0]['generated_text'])

💬 Prompt Format

This model was trained on a very specific format. For best results, your prompt must end with \nQuote:.

Keyword: [YOUR_KEYWORD]\nQuote:

The model will generate a single quote and append - Unknown.

🛠️ Training Procedure

This model was fine-tuned using trl.SFTTrainer with QLoRA.

Dataset: The Abirate/english_quotes dataset was "exploded" so that each (quote, tag) pair became a unique training example.
Format: The training text was formatted as Keyword: [tag]\nQuote: [quote] - Unknown. This was done to overwrite the base model's habit of adding real authors.
Evaluation: The model was trained with an 10% evaluation split and early_stopping_patience=3 to prevent overfitting.

Framework Versions

TRL: 0.25.0
Transformers: 4.57.1
Pytorch: 2.8.0+cu126
Datasets: 4.0.0
Tokenizers: 0.22.1

Downloads last month: 7

Model tree for bkqz/tinyllama-quotes-adapters

Base model

TinyLlama/TinyLlama-1.1B-Chat-v1.0

Adapter

(1263)

this model