-
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 141 -
Training language models to follow instructions with human feedback
Paper • 2203.02155 • Published • 24 -
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper • 2307.09288 • Published • 247 -
The Llama 3 Herd of Models
Paper • 2407.21783 • Published • 117
Dmitrij Gusev
mftrash
AI & ML interests
None yet
Recent Activity
updated
a collection
11 days ago
Post-training
liked
a Space
11 days ago
HuggingFaceTB/smol-training-playbook
updated
a collection
11 days ago
Post-training
Organizations
None yet