A Theoretical Study on Bridging Internal Probability and Self-Consistency for LLM Reasoning Paper • 2510.15444 • Published 29 days ago • 145
CWM: An Open-Weights LLM for Research on Code Generation with World Models Paper • 2510.02387 • Published Sep 30 • 7
SmolLM3 evaluation datasets Collection Datasets to decontaminate the post-training mixtures against. Use the subset and column values described per entry • 13 items • Updated Jul 8 • 7
SmolLM3 pretraining datasets Collection datasets used in SmolLM3 pretraining • 15 items • Updated Aug 12 • 37
Rope to Nope and Back Again: A New Hybrid Attention Strategy Paper • 2501.18795 • Published Jan 30 • 12
nvidia/Llama-3_1-Nemotron-Ultra-253B-v1 Text Generation • 253B • Updated about 1 month ago • 9.17k • • 339