--- license: apache-2.0 base_model: - Qwen/Qwen2.5-1.5B-Instruct --- # GRPO-LoRA-Base This is a LoRA adapter trained using the **GRPO (Group Relative Policy Optimization)** algorithm with a **multi-label reward model**, fine-tuned on Qwen2.5-0.5B for safe and aligned language generation. ## ๐Ÿ” Overview - **Base Model**: Qwen/Qwen2.5-1.5B-Instruct - **Tuning Method**: GRPO (No value critic, group-based relative rewards) - **LoRA Adapter**: Applied to attention and MLP projection layers - **Epochs**: 3 - **Steps**: 1000 - **GPU Memory Usage**: ~50% (4-bit + LoRA) ## ๐Ÿ“Š Reward Model A RoBERTa-based multi-label regression model was used to compute rewards on four alignment axes: - **Politeness** - **Meaningfulness** - **Actionability** - **Safety** Each output was scored in [0,1], and the **sum** of the four scores was used as the scalar reward. ## ๐Ÿงช Training Data - **Dataset**: 7,000 adversarial prompts crafted to challenge LLM alignment - **Format**: Prompt-response pairs with human-annotated alignment scores - **Split**: 6K training / 1K validation ## ๐Ÿ Evaluation | Metric | Base | Fine-Tuned | ฮ” | |---------------|------|------------|-------| | Politeness | 0.48 | 0.59 | +0.11 | | Meaningfulness | 0.61 | 0.65 | +0.04 | | Actionability | 0.53 | 0.66 | +0.13 | | Safety | 0.42 | 0.70 | +0.28 | | **Combined** | 0.54 | 0.66 | +0.12 | ## ๐Ÿš€ How to Use ```python from transformers import AutoModelForCausalLM, AutoTokenizer from peft import PeftModel base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-1.5B-Instruct") tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-1.5B-Instruct") adapter = PeftModel.from_pretrained(base_model, "hydroxai/grpo_saved_lora_15") inputs = tokenizer("How can we improve online safety?", return_tensors="pt") outputs = adapter.generate(**inputs, max_new_tokens=100) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ## โœ๏ธ Citation If you use this model, please cite: ```bibtex @article{li2025safegrpo, title = {Optimizing Safe and Aligned Language Generation: A Multi-Objective GRPO Approach}, author = {Li, Xuying and Li, Zhuo and Kosuga, Yuji and Bian, Victor}, journal = {arXiv preprint arXiv:2503.21819}, year = {2025}, url = {https://arxiv.org/abs/2503.21819} } ``` Maintained by HydroX AI.