Model Sources

Models before and after RLHF are released to facilitate reproduction of the Energy Loss Phenomenon introduced in [1], which provides a novel perspective on mitigating reward hacking in RLHF. The corresponding implementation is available at Energy-Loss-Phenomenon (Github).

[1] Y. Miao, S. Zhang, L. Ding, Y. Zhang, L. Zhang, and D. Tao. The Energy Loss Phenomenon in RLHF: A New Perspective on Mitigating Reward Hacking. In Proceedings of the 42nd International Conference on Machine Learning (ICML 2025).

Citation

If you find our work useful in your research, please consider citing our paper:

@inproceedings{miao2025the,
title={The Energy Loss Phenomenon in {RLHF}: A New Perspective on Mitigating Reward Hacking},
author={Yuchun Miao and Sen Zhang and Liang Ding and Yuqi Zhang and Lefei Zhang and Dacheng Tao},
booktitle={Forty-second International Conference on Machine Learning},
year={2025},
url={https://openreview.net/forum?id=82A81az3V5}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for mycccc/Energy-Loss-Phenomenon-Demo

Base model

meta-llama/Llama-2-7b

Finetuned

(40)

this model

mycccc
/

Energy-Loss-Phenomenon-Demo

Model Sources

Citation

Model tree for mycccc/Energy-Loss-Phenomenon-Demo

Dataset used to train mycccc/Energy-Loss-Phenomenon-Demo