Update paper reference and citation in model card

#12
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +17 -101
README.md CHANGED
@@ -1,102 +1,21 @@
1
  ---
 
 
 
 
2
  library_name: transformers
 
 
3
  tags:
4
  - code
5
  - math
6
  - reasoning
7
  - llm
8
- license: apache-2.0
9
- language:
10
- - en
11
- pipeline_tag: text-generation
12
- datasets:
13
- - tomg-group-umd/huginn-dataset
14
- # datasets: # cannot order these nicely
15
- # - HuggingFaceTB/smollm-corpus
16
- # - jon-tow/starcoderdata-python-edu
17
- # - ubaada/booksum-complete-cleaned
18
- # - euirim/goodwiki
19
- # - togethercomputer/RedPajama-Data-1T
20
- # - allenai/dolma
21
- # - bigcode/the-stack-v2-train-smol-ids
22
- # - bigcode/starcoderdata
23
- # - m-a-p/Matrix
24
- # - cerebras/SlimPajama-627B
25
- # - open-phi/textbooks
26
- # - open-phi/textbooks_grounded
27
- # - open-phi/programming_books_llama
28
- # - nampdn-ai/tiny-strange-textbooks
29
- # - nampdn-ai/tiny-textbooks
30
- # - nampdn-ai/tiny-code-textbooks
31
- # - nampdn-ai/tiny-orca-textbooks
32
- # - SciPhi/textbooks-are-all-you-need-lite
33
- # - vikp/textbook_quality_programming
34
- # - EleutherAI/proof-pile-2
35
- # - open-web-math/open-web-math
36
- # - biglam/blbooks-parquet
37
- # - storytracer/LoC-PD-Books
38
- # - GAIR/MathPile
39
- # - tomg-group-umd/CLRS-Text-train
40
- # - math-ai/AutoMathText
41
- # - bigcode/commitpackft
42
- # - bigcode/stack-dedup-python-fns
43
- # - vikp/python_code_instructions_filtered
44
- # - mlabonne/chessllm
45
- # - Waterhorse/chess_data
46
- # - EleutherAI/lichess-puzzles
47
- # - chargoddard/WebInstructSub-prometheus
48
- # - Locutusque/hercules-v5.0
49
- # - nvidia/OpenMathInstruct-1
50
- # - meta-math/MetaMathQA
51
- # - m-a-p/CodeFeedback-Filtered-Instruction
52
- # - nvidia/Daring-Anteater
53
- # - nvidia/sft_datablend_v1
54
- # - BAAI/Infinity-Instruct
55
- # - anthracite-org/Stheno-Data-Filtered
56
- # - Nopm/Opus_WritingStruct
57
- # - xinlai/Math-Step-DPO-10K
58
- # - bigcode/self-oss-instruct-sc2-exec-filter-50k
59
- # - HuggingFaceTB/everyday-conversations
60
- # - hkust-nlp/gsm8k-fix
61
- # - HuggingFaceH4/no_robots
62
- # - THUDM/LongWriter-6k
63
- # - THUDM/webglm-qa
64
- # - AlgorithmicResearchGroup/ArXivDLInstruct
65
- # - allenai/tulu-v2-sft-mixture-olmo-4096
66
- # - bigscience/P3
67
- # - Gryphe/Sonnet3.5-SlimOrcaDedupCleaned
68
- # - Gryphe/Opus-WritingPrompts
69
- # - nothingiisreal/Reddit-Dirty-And-WritingPrompts
70
- # - nothingiisreal/Kalomaze-Opus-Instruct-25k-filtered
71
- # - internlm/Lean-Github
72
- # - pkuAI4M/LeanWorkbook
73
- # - casey-martin/multilingual-mathematical-autoformalization
74
- # - AI4M/leandojo-informalized
75
- # - casey-martin/oa_cpp_annotate_gen
76
- # - l3lab/ntp-mathlib-instruct-st
77
- # - ajibawa-2023/Maths-College
78
- # - ajibawa-2023/Maths-Grade-School
79
- # - ajibawa-2023/General-Stories-Collection
80
- # - XinyaoHu/AMPS_mathematica
81
- # - XinyaoHu/AMPS_khan
82
- # - Magpie-Align/Magpie-Pro-MT-300K-v0.1
83
- # - Magpie-Align/Magpie-Reasoning-150K
84
- # - gair-prox/FineWeb-pro
85
- # - gair-prox/c4-pro
86
- # - gair-prox/RedPajama-pro
87
- # - gair-prox/open-web-math-pro
88
- # - togethercomputer/Long-Data-Collections
89
- # - emozilla/pg19
90
- # - MathGenie/MathCode-Pile
91
- # - KingNish/reasoning-base-20k
92
- # - nvidia/OpenMathInstruct-2
93
- # - LLM360/TxT360
94
- # - neuralwork/arxiver
95
  ---
96
 
97
  # Huginn-0125
98
  This is Huginn, version 01/25, a latent recurrent-depth model with 3.5B parameters, trained for 800B tokens on AMD MI250X machines. This is a proof-of-concept model, but surprisingly capable in reasoning and code given its training budget and size.
99
- All details on this model can be found in the tech report: "Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach." (https://www.arxiv.org/abs/2502.05171)
100
 
101
  8 intermediate checkpoints of the model can be found in its collection. Additional intermediate checkpoints are available upon request while we find a place to host all ~350 of them. The data used to train
102
  this model is publicly available (entirely on Hugging Face), and scripts provided with the pretraining code at https://github.com/seal-rg/recurrent-pretraining can be used to repeat our preprocessing and our entire training run.
@@ -105,7 +24,7 @@ this model is publicly available (entirely on Hugging Face), and scripts provide
105
 
106
 
107
 
108
- ## Table of Contents
109
 
110
  1. [How to Use](#downloading-and-using-the-model)
111
  2. [Advanced Usage](#advanced-features)
@@ -220,8 +139,6 @@ At each generation step, the recurrence can be warmstarted with the final state
220
  model.generate_with_adaptive_compute(input_ids, config, num_steps=64, tokenizer=tokenizer, streamer=streamer, continuous_compute=True)
221
  ```
222
 
223
-
224
-
225
  ## Model Summary
226
  The model is primarily structured around decoder-only transformer blocks. However these blocks are structured into three functional groups, the __prelude__ \\(P\\),
227
  which embeds the input data into a latent space using multiple transformer layers, then the core __recurrent block__ \\(R\\), which is the central unit of recurrent
@@ -254,22 +171,21 @@ This model was trained on 21 segments of 4096 AMD MI-250X GPUs on the OLCF Front
254
  This model is released under the [apache-2.0](https://choosealicense.com/licenses/apache-2.0/) licence.
255
 
256
  ## Citation
257
- ```
258
- @article{geiping_scaling_2025,
259
- title = {Scaling up {{Test-Time Compute}} with {{Latent Reasoning}}: {{A Recurrent Depth Approach}}},
260
- shorttitle = {Scaling up {{Test-Time Compute}} with {{Latent Reasoning}}},
261
  author = {Geiping, Jonas and McLeish, Sean and Jain, Neel and Kirchenbauer, John and Singh, Siddharth and Bartoldson, Brian R. and Kailkhura, Bhavya and Bhatele, Abhinav and Goldstein, Tom},
262
  year = {2025},
263
- month = feb,
264
- eprint = {2502.05171},
265
  primaryclass = {cs},
266
  publisher = {arXiv},
267
- doi = {10.48550/arXiv.2502.05171},
268
- url = {http://arxiv.org/abs/2502.05171},
269
- urldate = {2025-02-10},
270
  archiveprefix = {arXiv},
271
  keywords = {Computer Science - Computation and Language,Computer Science - Machine Learning},
272
- journal = {arxiv:2502.05171[cs]}
273
  }
274
  ```
275
 
 
1
  ---
2
+ datasets:
3
+ - tomg-group-umd/huginn-dataset
4
+ language:
5
+ - en
6
  library_name: transformers
7
+ license: apache-2.0
8
+ pipeline_tag: text-generation
9
  tags:
10
  - code
11
  - math
12
  - reasoning
13
  - llm
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  ---
15
 
16
  # Huginn-0125
17
  This is Huginn, version 01/25, a latent recurrent-depth model with 3.5B parameters, trained for 800B tokens on AMD MI250X machines. This is a proof-of-concept model, but surprisingly capable in reasoning and code given its training budget and size.
18
+ All details on this model can be found in the paper: [Latent Chain-of-Thought? Decoding the Depth-Recurrent Transformer](https://huggingface.co/papers/2507.02199).
19
 
20
  8 intermediate checkpoints of the model can be found in its collection. Additional intermediate checkpoints are available upon request while we find a place to host all ~350 of them. The data used to train
21
  this model is publicly available (entirely on Hugging Face), and scripts provided with the pretraining code at https://github.com/seal-rg/recurrent-pretraining can be used to repeat our preprocessing and our entire training run.
 
24
 
25
 
26
 
27
+ ## Table of Contents
28
 
29
  1. [How to Use](#downloading-and-using-the-model)
30
  2. [Advanced Usage](#advanced-features)
 
139
  model.generate_with_adaptive_compute(input_ids, config, num_steps=64, tokenizer=tokenizer, streamer=streamer, continuous_compute=True)
140
  ```
141
 
 
 
142
  ## Model Summary
143
  The model is primarily structured around decoder-only transformer blocks. However these blocks are structured into three functional groups, the __prelude__ \\(P\\),
144
  which embeds the input data into a latent space using multiple transformer layers, then the core __recurrent block__ \\(R\\), which is the central unit of recurrent
 
171
  This model is released under the [apache-2.0](https://choosealicense.com/licenses/apache-2.0/) licence.
172
 
173
  ## Citation
174
+ ```bibtex
175
+ @article{geiping_latent_2025,
176
+ title = {Latent Chain-of-Thought? Decoding the Depth-Recurrent Transformer},
 
177
  author = {Geiping, Jonas and McLeish, Sean and Jain, Neel and Kirchenbauer, John and Singh, Siddharth and Bartoldson, Brian R. and Kailkhura, Bhavya and Bhatele, Abhinav and Goldstein, Tom},
178
  year = {2025},
179
+ month = jul,
180
+ eprint = {2507.02199},
181
  primaryclass = {cs},
182
  publisher = {arXiv},
183
+ doi = {10.48550/arXiv.2507.02199},
184
+ url = {http://arxiv.org/abs/2507.02199},
185
+ urldate = {2025-07-02},
186
  archiveprefix = {arXiv},
187
  keywords = {Computer Science - Computation and Language,Computer Science - Machine Learning},
188
+ journal = {arxiv:2507.02199[cs]}
189
  }
190
  ```
191