RLHF
Collection
Reinforcement Learning with Human Feedback
•
3 items
•
Updated
•
1
This model is a fine-tuned version of Qwen2.5-0.5B-Instruct on the ProlificAI/social-reasoning-rlhf dataset using ORPO. The primary objective was to experiment with Reinforcement Learning from Human Feedback (RLHF) via ORPO, focusing on preference alignment.
Use the code below to get started with the model.
from huggingface_hub import login
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
login(token="")
tokenizer = AutoTokenizer.from_pretrained("unsloth/Qwen2.5-0.5B-Instruct",)
base_model = AutoModelForCausalLM.from_pretrained(
"unsloth/Qwen2.5-0.5B-Instruct",
device_map={"": 0}, token=""
)
model = PeftModel.from_pretrained(base_model,"khazarai/Social-RLHF")
prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction:
{}
### Input:
{}
### Response:
{}"""
inputs = tokenizer(
[
prompt.format(
"You are an AI assistant that helps people find information",
"A stranger shares private information with you on public transportation. How might you respond sensitively?",
"",
)
],
return_tensors="pt",
).to("cuda")
from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer=text_streamer, max_new_tokens=512)
Base model
Qwen/Qwen2.5-0.5B