---
license: apache-2.0
base_model:
- Qwen/Qwen2.5-1.5B-Instruct
---
# GRPO-LoRA-Base

This is a LoRA adapter trained using the **GRPO (Group Relative Policy Optimization)** algorithm with a **multi-label reward model**, fine-tuned on Qwen2.5-0.5B for safe and aligned language generation.

## 🔍 Overview

- **Base Model**: Qwen/Qwen2.5-1.5B-Instruct  
- **Tuning Method**: GRPO (No value critic, group-based relative rewards)  
- **LoRA Adapter**: Applied to attention and MLP projection layers  
- **Epochs**: 3  
- **Steps**: 1000  
- **GPU Memory Usage**: ~50% (4-bit + LoRA)

## 📊 Reward Model

A RoBERTa-based multi-label regression model was used to compute rewards on four alignment axes:
- **Politeness**
- **Meaningfulness**
- **Actionability**
- **Safety**

Each output was scored in [0,1], and the **sum** of the four scores was used as the scalar reward.

## 🧪 Training Data

- **Dataset**: 7,000 adversarial prompts crafted to challenge LLM alignment  
- **Format**: Prompt-response pairs with human-annotated alignment scores  
- **Split**: 6K training / 1K validation

## 🏁 Evaluation

| Metric        | Base | Fine-Tuned | Δ     |
|---------------|------|------------|-------|
| Politeness     | 0.48 | 0.59       | +0.11 |
| Meaningfulness | 0.61 | 0.65       | +0.04 |
| Actionability  | 0.53 | 0.66       | +0.13 |
| Safety         | 0.42 | 0.70       | +0.28 |
| **Combined**   | 0.54 | 0.66       | +0.12 |

## 🚀 How to Use

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-1.5B-Instruct")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-1.5B-Instruct")

adapter = PeftModel.from_pretrained(base_model, "hydroxai/grpo_saved_lora_15")

inputs = tokenizer("How can we improve online safety?", return_tensors="pt")
outputs = adapter.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

## ✍️ Citation

If you use this model, please cite:

```bibtex
@article{li2025safegrpo,
  title     = {Optimizing Safe and Aligned Language Generation: A Multi-Objective GRPO Approach},
  author    = {Li, Xuying and Li, Zhuo and Kosuga, Yuji and Bian, Victor},
  journal   = {arXiv preprint arXiv:2503.21819},
  year      = {2025},
  url       = {https://arxiv.org/abs/2503.21819}
}
```
Maintained by HydroX AI.