vincentoh commited on
Commit
7213397
·
verified ·
1 Parent(s): a83f896

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -22,6 +22,8 @@ language:
22
 
23
  # Llama 3.2 3B GuardReasoner Exp 19: HS-DPO Toy (10% Dataset)
24
 
 
 
25
  This is a LoRA adapter fine-tuned on **10% of the full GuardReasoner dataset** using **Harmonic Sampling Direct Preference Optimization (HS-DPO)**.
26
 
27
  **Base Model:** [unsloth/Llama-3.2-3B-Instruct](https://huggingface.co/unsloth/Llama-3.2-3B-Instruct)
 
22
 
23
  # Llama 3.2 3B GuardReasoner Exp 19: HS-DPO Toy (10% Dataset)
24
 
25
+ Binary Classifier
26
+
27
  This is a LoRA adapter fine-tuned on **10% of the full GuardReasoner dataset** using **Harmonic Sampling Direct Preference Optimization (HS-DPO)**.
28
 
29
  **Base Model:** [unsloth/Llama-3.2-3B-Instruct](https://huggingface.co/unsloth/Llama-3.2-3B-Instruct)