Running
Wechat Style Sft
🌍
QLoRA fine-tuning demo on WeChat essays for style adaptation
None defined yet.
SofT-GRPO: Surpassing Discrete-Token LLM Reinforcement Learning via Gumbel-Reparameterized Soft-Thinking Policy Optimization
Diffusion Language Models are Super Data Learners