Update paper reference and citation in model card
#12
by
nielsr
HF Staff
- opened
README.md
CHANGED
|
@@ -1,102 +1,21 @@
|
|
| 1 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
library_name: transformers
|
|
|
|
|
|
|
| 3 |
tags:
|
| 4 |
- code
|
| 5 |
- math
|
| 6 |
- reasoning
|
| 7 |
- llm
|
| 8 |
-
license: apache-2.0
|
| 9 |
-
language:
|
| 10 |
-
- en
|
| 11 |
-
pipeline_tag: text-generation
|
| 12 |
-
datasets:
|
| 13 |
-
- tomg-group-umd/huginn-dataset
|
| 14 |
-
# datasets: # cannot order these nicely
|
| 15 |
-
# - HuggingFaceTB/smollm-corpus
|
| 16 |
-
# - jon-tow/starcoderdata-python-edu
|
| 17 |
-
# - ubaada/booksum-complete-cleaned
|
| 18 |
-
# - euirim/goodwiki
|
| 19 |
-
# - togethercomputer/RedPajama-Data-1T
|
| 20 |
-
# - allenai/dolma
|
| 21 |
-
# - bigcode/the-stack-v2-train-smol-ids
|
| 22 |
-
# - bigcode/starcoderdata
|
| 23 |
-
# - m-a-p/Matrix
|
| 24 |
-
# - cerebras/SlimPajama-627B
|
| 25 |
-
# - open-phi/textbooks
|
| 26 |
-
# - open-phi/textbooks_grounded
|
| 27 |
-
# - open-phi/programming_books_llama
|
| 28 |
-
# - nampdn-ai/tiny-strange-textbooks
|
| 29 |
-
# - nampdn-ai/tiny-textbooks
|
| 30 |
-
# - nampdn-ai/tiny-code-textbooks
|
| 31 |
-
# - nampdn-ai/tiny-orca-textbooks
|
| 32 |
-
# - SciPhi/textbooks-are-all-you-need-lite
|
| 33 |
-
# - vikp/textbook_quality_programming
|
| 34 |
-
# - EleutherAI/proof-pile-2
|
| 35 |
-
# - open-web-math/open-web-math
|
| 36 |
-
# - biglam/blbooks-parquet
|
| 37 |
-
# - storytracer/LoC-PD-Books
|
| 38 |
-
# - GAIR/MathPile
|
| 39 |
-
# - tomg-group-umd/CLRS-Text-train
|
| 40 |
-
# - math-ai/AutoMathText
|
| 41 |
-
# - bigcode/commitpackft
|
| 42 |
-
# - bigcode/stack-dedup-python-fns
|
| 43 |
-
# - vikp/python_code_instructions_filtered
|
| 44 |
-
# - mlabonne/chessllm
|
| 45 |
-
# - Waterhorse/chess_data
|
| 46 |
-
# - EleutherAI/lichess-puzzles
|
| 47 |
-
# - chargoddard/WebInstructSub-prometheus
|
| 48 |
-
# - Locutusque/hercules-v5.0
|
| 49 |
-
# - nvidia/OpenMathInstruct-1
|
| 50 |
-
# - meta-math/MetaMathQA
|
| 51 |
-
# - m-a-p/CodeFeedback-Filtered-Instruction
|
| 52 |
-
# - nvidia/Daring-Anteater
|
| 53 |
-
# - nvidia/sft_datablend_v1
|
| 54 |
-
# - BAAI/Infinity-Instruct
|
| 55 |
-
# - anthracite-org/Stheno-Data-Filtered
|
| 56 |
-
# - Nopm/Opus_WritingStruct
|
| 57 |
-
# - xinlai/Math-Step-DPO-10K
|
| 58 |
-
# - bigcode/self-oss-instruct-sc2-exec-filter-50k
|
| 59 |
-
# - HuggingFaceTB/everyday-conversations
|
| 60 |
-
# - hkust-nlp/gsm8k-fix
|
| 61 |
-
# - HuggingFaceH4/no_robots
|
| 62 |
-
# - THUDM/LongWriter-6k
|
| 63 |
-
# - THUDM/webglm-qa
|
| 64 |
-
# - AlgorithmicResearchGroup/ArXivDLInstruct
|
| 65 |
-
# - allenai/tulu-v2-sft-mixture-olmo-4096
|
| 66 |
-
# - bigscience/P3
|
| 67 |
-
# - Gryphe/Sonnet3.5-SlimOrcaDedupCleaned
|
| 68 |
-
# - Gryphe/Opus-WritingPrompts
|
| 69 |
-
# - nothingiisreal/Reddit-Dirty-And-WritingPrompts
|
| 70 |
-
# - nothingiisreal/Kalomaze-Opus-Instruct-25k-filtered
|
| 71 |
-
# - internlm/Lean-Github
|
| 72 |
-
# - pkuAI4M/LeanWorkbook
|
| 73 |
-
# - casey-martin/multilingual-mathematical-autoformalization
|
| 74 |
-
# - AI4M/leandojo-informalized
|
| 75 |
-
# - casey-martin/oa_cpp_annotate_gen
|
| 76 |
-
# - l3lab/ntp-mathlib-instruct-st
|
| 77 |
-
# - ajibawa-2023/Maths-College
|
| 78 |
-
# - ajibawa-2023/Maths-Grade-School
|
| 79 |
-
# - ajibawa-2023/General-Stories-Collection
|
| 80 |
-
# - XinyaoHu/AMPS_mathematica
|
| 81 |
-
# - XinyaoHu/AMPS_khan
|
| 82 |
-
# - Magpie-Align/Magpie-Pro-MT-300K-v0.1
|
| 83 |
-
# - Magpie-Align/Magpie-Reasoning-150K
|
| 84 |
-
# - gair-prox/FineWeb-pro
|
| 85 |
-
# - gair-prox/c4-pro
|
| 86 |
-
# - gair-prox/RedPajama-pro
|
| 87 |
-
# - gair-prox/open-web-math-pro
|
| 88 |
-
# - togethercomputer/Long-Data-Collections
|
| 89 |
-
# - emozilla/pg19
|
| 90 |
-
# - MathGenie/MathCode-Pile
|
| 91 |
-
# - KingNish/reasoning-base-20k
|
| 92 |
-
# - nvidia/OpenMathInstruct-2
|
| 93 |
-
# - LLM360/TxT360
|
| 94 |
-
# - neuralwork/arxiver
|
| 95 |
---
|
| 96 |
|
| 97 |
# Huginn-0125
|
| 98 |
This is Huginn, version 01/25, a latent recurrent-depth model with 3.5B parameters, trained for 800B tokens on AMD MI250X machines. This is a proof-of-concept model, but surprisingly capable in reasoning and code given its training budget and size.
|
| 99 |
-
All details on this model can be found in the
|
| 100 |
|
| 101 |
8 intermediate checkpoints of the model can be found in its collection. Additional intermediate checkpoints are available upon request while we find a place to host all ~350 of them. The data used to train
|
| 102 |
this model is publicly available (entirely on Hugging Face), and scripts provided with the pretraining code at https://github.com/seal-rg/recurrent-pretraining can be used to repeat our preprocessing and our entire training run.
|
|
@@ -105,7 +24,7 @@ this model is publicly available (entirely on Hugging Face), and scripts provide
|
|
| 105 |
|
| 106 |
|
| 107 |
|
| 108 |
-
##
|
| 109 |
|
| 110 |
1. [How to Use](#downloading-and-using-the-model)
|
| 111 |
2. [Advanced Usage](#advanced-features)
|
|
@@ -220,8 +139,6 @@ At each generation step, the recurrence can be warmstarted with the final state
|
|
| 220 |
model.generate_with_adaptive_compute(input_ids, config, num_steps=64, tokenizer=tokenizer, streamer=streamer, continuous_compute=True)
|
| 221 |
```
|
| 222 |
|
| 223 |
-
|
| 224 |
-
|
| 225 |
## Model Summary
|
| 226 |
The model is primarily structured around decoder-only transformer blocks. However these blocks are structured into three functional groups, the __prelude__ \\(P\\),
|
| 227 |
which embeds the input data into a latent space using multiple transformer layers, then the core __recurrent block__ \\(R\\), which is the central unit of recurrent
|
|
@@ -254,22 +171,21 @@ This model was trained on 21 segments of 4096 AMD MI-250X GPUs on the OLCF Front
|
|
| 254 |
This model is released under the [apache-2.0](https://choosealicense.com/licenses/apache-2.0/) licence.
|
| 255 |
|
| 256 |
## Citation
|
| 257 |
-
```
|
| 258 |
-
@article{
|
| 259 |
-
title = {
|
| 260 |
-
shorttitle = {Scaling up {{Test-Time Compute}} with {{Latent Reasoning}}},
|
| 261 |
author = {Geiping, Jonas and McLeish, Sean and Jain, Neel and Kirchenbauer, John and Singh, Siddharth and Bartoldson, Brian R. and Kailkhura, Bhavya and Bhatele, Abhinav and Goldstein, Tom},
|
| 262 |
year = {2025},
|
| 263 |
-
month =
|
| 264 |
-
eprint = {
|
| 265 |
primaryclass = {cs},
|
| 266 |
publisher = {arXiv},
|
| 267 |
-
doi = {10.48550/arXiv.
|
| 268 |
-
url = {http://arxiv.org/abs/
|
| 269 |
-
urldate = {2025-02
|
| 270 |
archiveprefix = {arXiv},
|
| 271 |
keywords = {Computer Science - Computation and Language,Computer Science - Machine Learning},
|
| 272 |
-
journal = {arxiv:
|
| 273 |
}
|
| 274 |
```
|
| 275 |
|
|
|
|
| 1 |
---
|
| 2 |
+
datasets:
|
| 3 |
+
- tomg-group-umd/huginn-dataset
|
| 4 |
+
language:
|
| 5 |
+
- en
|
| 6 |
library_name: transformers
|
| 7 |
+
license: apache-2.0
|
| 8 |
+
pipeline_tag: text-generation
|
| 9 |
tags:
|
| 10 |
- code
|
| 11 |
- math
|
| 12 |
- reasoning
|
| 13 |
- llm
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 14 |
---
|
| 15 |
|
| 16 |
# Huginn-0125
|
| 17 |
This is Huginn, version 01/25, a latent recurrent-depth model with 3.5B parameters, trained for 800B tokens on AMD MI250X machines. This is a proof-of-concept model, but surprisingly capable in reasoning and code given its training budget and size.
|
| 18 |
+
All details on this model can be found in the paper: [Latent Chain-of-Thought? Decoding the Depth-Recurrent Transformer](https://huggingface.co/papers/2507.02199).
|
| 19 |
|
| 20 |
8 intermediate checkpoints of the model can be found in its collection. Additional intermediate checkpoints are available upon request while we find a place to host all ~350 of them. The data used to train
|
| 21 |
this model is publicly available (entirely on Hugging Face), and scripts provided with the pretraining code at https://github.com/seal-rg/recurrent-pretraining can be used to repeat our preprocessing and our entire training run.
|
|
|
|
| 24 |
|
| 25 |
|
| 26 |
|
| 27 |
+
## Table of Contents
|
| 28 |
|
| 29 |
1. [How to Use](#downloading-and-using-the-model)
|
| 30 |
2. [Advanced Usage](#advanced-features)
|
|
|
|
| 139 |
model.generate_with_adaptive_compute(input_ids, config, num_steps=64, tokenizer=tokenizer, streamer=streamer, continuous_compute=True)
|
| 140 |
```
|
| 141 |
|
|
|
|
|
|
|
| 142 |
## Model Summary
|
| 143 |
The model is primarily structured around decoder-only transformer blocks. However these blocks are structured into three functional groups, the __prelude__ \\(P\\),
|
| 144 |
which embeds the input data into a latent space using multiple transformer layers, then the core __recurrent block__ \\(R\\), which is the central unit of recurrent
|
|
|
|
| 171 |
This model is released under the [apache-2.0](https://choosealicense.com/licenses/apache-2.0/) licence.
|
| 172 |
|
| 173 |
## Citation
|
| 174 |
+
```bibtex
|
| 175 |
+
@article{geiping_latent_2025,
|
| 176 |
+
title = {Latent Chain-of-Thought? Decoding the Depth-Recurrent Transformer},
|
|
|
|
| 177 |
author = {Geiping, Jonas and McLeish, Sean and Jain, Neel and Kirchenbauer, John and Singh, Siddharth and Bartoldson, Brian R. and Kailkhura, Bhavya and Bhatele, Abhinav and Goldstein, Tom},
|
| 178 |
year = {2025},
|
| 179 |
+
month = jul,
|
| 180 |
+
eprint = {2507.02199},
|
| 181 |
primaryclass = {cs},
|
| 182 |
publisher = {arXiv},
|
| 183 |
+
doi = {10.48550/arXiv.2507.02199},
|
| 184 |
+
url = {http://arxiv.org/abs/2507.02199},
|
| 185 |
+
urldate = {2025-07-02},
|
| 186 |
archiveprefix = {arXiv},
|
| 187 |
keywords = {Computer Science - Computation and Language,Computer Science - Machine Learning},
|
| 188 |
+
journal = {arxiv:2507.02199[cs]}
|
| 189 |
}
|
| 190 |
```
|
| 191 |
|