Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
sail 's Collections
Precision-RL
πŸš€ Active PRM
🌾Oat-Zero: Understanding R1-Zero-Like Training
πŸ”± Sailor2 Language Models
🧬 RegMix: Data Mixture as Regression
πŸ“ˆ Scaling Laws with Vocabulary
πŸ’‘ DICE
βš“οΈ Sailor Language Models

Precision-RL

updated 9 days ago

Defeating the Training-Inference Mismatch via FP16

Upvote
-

  • Defeating the Training-Inference Mismatch via FP16

    Paper β€’ 2510.26788 β€’ Published 24 days ago β€’ 27

  • sail/Sanity-Test-R1D-1.5B

    Viewer β€’ Updated 8 days ago β€’ 1.52k β€’ 72 β€’ 6
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs