purbeshmitra
/

semantic-soft-bootstrapping

Text Generation

Model card Files Files and versions

purbeshmitra commited on 17 days ago

Commit

35b5e73

·

verified ·

1 Parent(s): 4195603

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -24,7 +24,7 @@ metrics:
 🔗 Github link: [Training code](https://github.com/purbeshmitra/semantic-soft-bootstrapping)
-Semantic Soft Bootstrapping (SSB), an RL-free self-distillation framework that improves long-context reasoning in LLMs by training the model on its own hinted reasoning as a teacher. Rather than relying on a separate larger teacher or on-policy gradient with sparse rewards, SSB uses the same base model in two semantic roles: a hinted teacher that sees both correct and incorrect solutions and synthesizes a robust explanation, and a hint-free student that learns to reproduce this behavior from the bare question alone. Starting from a raw problem–answer dataset, we construct paired teacher–student conversations and then precompute teacher logits over the answer tokens, enabling efficient offline distillation without any human annotation or online RL loop. This is depicted as following:
 <p align="center">
   <img src="assets/ssb.png" alt="Alt Text" width="750">
 </p>

 🔗 Github link: [Training code](https://github.com/purbeshmitra/semantic-soft-bootstrapping)
+Semantic Soft Bootstrapping (SSB) is an RL-free self-distillation framework that improves long-context reasoning in LLMs by training the model on its own hinted reasoning as a teacher. Rather than relying on a separate larger teacher or on-policy gradient with sparse rewards, SSB uses the same base model in two semantic roles: a hinted teacher that sees both correct and incorrect solutions and synthesizes a robust explanation, and a hint-free student that learns to reproduce this behavior from the bare question alone. Starting from a raw problem–answer dataset, we construct paired teacher–student conversations and then precompute teacher logits over the answer tokens, enabling efficient offline distillation without any human annotation or online RL loop. This is depicted as following:
 <p align="center">
   <img src="assets/ssb.png" alt="Alt Text" width="750">
 </p>