tomg-group-umd
/

huginn-0125

@@ -1,102 +1,21 @@
 ---
 library_name: transformers
 tags:
 - code
 - math
 - reasoning
 - llm
-license: apache-2.0
-language:
-- en
-pipeline_tag: text-generation
-datasets:
-  - tomg-group-umd/huginn-dataset
-# datasets: # cannot order these nicely
-# - HuggingFaceTB/smollm-corpus
-# - jon-tow/starcoderdata-python-edu
-# - ubaada/booksum-complete-cleaned
-# - euirim/goodwiki
-# - togethercomputer/RedPajama-Data-1T
-# - allenai/dolma
-# - bigcode/the-stack-v2-train-smol-ids
-# - bigcode/starcoderdata
-# - m-a-p/Matrix
-# - cerebras/SlimPajama-627B
-# - open-phi/textbooks
-# - open-phi/textbooks_grounded
-# - open-phi/programming_books_llama
-# - nampdn-ai/tiny-strange-textbooks
-# - nampdn-ai/tiny-textbooks
-# - nampdn-ai/tiny-code-textbooks
-# - nampdn-ai/tiny-orca-textbooks
-# - SciPhi/textbooks-are-all-you-need-lite
-# - vikp/textbook_quality_programming
-# - EleutherAI/proof-pile-2
-# - open-web-math/open-web-math
-# - biglam/blbooks-parquet
-# - storytracer/LoC-PD-Books
-# - GAIR/MathPile
-# - tomg-group-umd/CLRS-Text-train
-# - math-ai/AutoMathText
-# - bigcode/commitpackft
-# - bigcode/stack-dedup-python-fns
-# - vikp/python_code_instructions_filtered
-# - mlabonne/chessllm
-# - Waterhorse/chess_data
-# - EleutherAI/lichess-puzzles
-# - chargoddard/WebInstructSub-prometheus
-# - Locutusque/hercules-v5.0
-# - nvidia/OpenMathInstruct-1
-# - meta-math/MetaMathQA
-# - m-a-p/CodeFeedback-Filtered-Instruction
-# - nvidia/Daring-Anteater
-# - nvidia/sft_datablend_v1
-# - BAAI/Infinity-Instruct
-# - anthracite-org/Stheno-Data-Filtered
-# - Nopm/Opus_WritingStruct
-# - xinlai/Math-Step-DPO-10K
-# - bigcode/self-oss-instruct-sc2-exec-filter-50k
-# - HuggingFaceTB/everyday-conversations
-# - hkust-nlp/gsm8k-fix
-# - HuggingFaceH4/no_robots
-# - THUDM/LongWriter-6k
-# - THUDM/webglm-qa
-# - AlgorithmicResearchGroup/ArXivDLInstruct
-# - allenai/tulu-v2-sft-mixture-olmo-4096
-# - bigscience/P3
-# - Gryphe/Sonnet3.5-SlimOrcaDedupCleaned
-# - Gryphe/Opus-WritingPrompts
-# - nothingiisreal/Reddit-Dirty-And-WritingPrompts
-# - nothingiisreal/Kalomaze-Opus-Instruct-25k-filtered
-# - internlm/Lean-Github
-# - pkuAI4M/LeanWorkbook
-# - casey-martin/multilingual-mathematical-autoformalization
-# - AI4M/leandojo-informalized
-# - casey-martin/oa_cpp_annotate_gen
-# - l3lab/ntp-mathlib-instruct-st
-# - ajibawa-2023/Maths-College
-# - ajibawa-2023/Maths-Grade-School
-# - ajibawa-2023/General-Stories-Collection
-# - XinyaoHu/AMPS_mathematica
-# - XinyaoHu/AMPS_khan
-# - Magpie-Align/Magpie-Pro-MT-300K-v0.1
-# - Magpie-Align/Magpie-Reasoning-150K
-# - gair-prox/FineWeb-pro
-# - gair-prox/c4-pro
-# - gair-prox/RedPajama-pro
-# - gair-prox/open-web-math-pro
-# - togethercomputer/Long-Data-Collections
-# - emozilla/pg19
-# - MathGenie/MathCode-Pile
-# - KingNish/reasoning-base-20k
-# - nvidia/OpenMathInstruct-2
-# - LLM360/TxT360
-# - neuralwork/arxiver
 ---
 # Huginn-0125
 This is Huginn, version 01/25, a latent recurrent-depth model with 3.5B parameters, trained for 800B tokens on AMD MI250X machines. This is a proof-of-concept model, but surprisingly capable in reasoning and code given its training budget and size.
-All details on this model can be found in the tech report: "Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach." (https://www.arxiv.org/abs/2502.05171)
 8 intermediate checkpoints of the model can be found in its collection. Additional intermediate checkpoints are available upon request while we find a place to host all ~350 of them. The data used to train
 this model is publicly available (entirely on Hugging Face), and scripts provided with the pretraining code at https://github.com/seal-rg/recurrent-pretraining can be used to repeat our preprocessing and our entire training run.
@@ -105,7 +24,7 @@ this model is publicly available (entirely on Hugging Face), and scripts provide
-##  Table of Contents
 1. [How to Use](#downloading-and-using-the-model)
 2. [Advanced Usage](#advanced-features)
@@ -220,8 +139,6 @@ At each generation step, the recurrence can be warmstarted with the final state
 model.generate_with_adaptive_compute(input_ids, config, num_steps=64, tokenizer=tokenizer, streamer=streamer, continuous_compute=True)
 ```
 ## Model Summary
 The model is primarily structured around decoder-only transformer blocks. However these blocks are structured into three functional groups, the __prelude__ \\(P\\),
 which embeds the input data into a latent space using multiple transformer layers, then the core __recurrent block__ \\(R\\), which is the central unit of recurrent
@@ -254,22 +171,21 @@ This model was trained on 21 segments of 4096 AMD MI-250X GPUs on the OLCF Front
 This model is released under the [apache-2.0](https://choosealicense.com/licenses/apache-2.0/) licence.
 ## Citation
-```
-@article{geiping_scaling_2025,
-  title = {Scaling up {{Test-Time Compute}} with {{Latent Reasoning}}: {{A Recurrent Depth Approach}}},
-  shorttitle = {Scaling up {{Test-Time Compute}} with {{Latent Reasoning}}},
   author = {Geiping, Jonas and McLeish, Sean and Jain, Neel and Kirchenbauer, John and Singh, Siddharth and Bartoldson, Brian R. and Kailkhura, Bhavya and Bhatele, Abhinav and Goldstein, Tom},
   year = {2025},
-  month = feb,
-  eprint = {2502.05171},
   primaryclass = {cs},
   publisher = {arXiv},
-  doi = {10.48550/arXiv.2502.05171},
-  url = {http://arxiv.org/abs/2502.05171},
-  urldate = {2025-02-10},
   archiveprefix = {arXiv},
   keywords = {Computer Science - Computation and Language,Computer Science - Machine Learning},
-  journal = {arxiv:2502.05171[cs]}
 }
 ```

 ---
+datasets:
+- tomg-group-umd/huginn-dataset
+language:
+- en
 library_name: transformers
+license: apache-2.0
+pipeline_tag: text-generation
 tags:
 - code
 - math
 - reasoning
 - llm
 ---
 # Huginn-0125
 This is Huginn, version 01/25, a latent recurrent-depth model with 3.5B parameters, trained for 800B tokens on AMD MI250X machines. This is a proof-of-concept model, but surprisingly capable in reasoning and code given its training budget and size.
+All details on this model can be found in the paper: [Latent Chain-of-Thought? Decoding the Depth-Recurrent Transformer](https://huggingface.co/papers/2507.02199).
 8 intermediate checkpoints of the model can be found in its collection. Additional intermediate checkpoints are available upon request while we find a place to host all ~350 of them. The data used to train
 this model is publicly available (entirely on Hugging Face), and scripts provided with the pretraining code at https://github.com/seal-rg/recurrent-pretraining can be used to repeat our preprocessing and our entire training run.
+## Table of Contents
 1. [How to Use](#downloading-and-using-the-model)
 2. [Advanced Usage](#advanced-features)
 model.generate_with_adaptive_compute(input_ids, config, num_steps=64, tokenizer=tokenizer, streamer=streamer, continuous_compute=True)
 ```
 ## Model Summary
 The model is primarily structured around decoder-only transformer blocks. However these blocks are structured into three functional groups, the __prelude__ \\(P\\),
 which embeds the input data into a latent space using multiple transformer layers, then the core __recurrent block__ \\(R\\), which is the central unit of recurrent
 This model is released under the [apache-2.0](https://choosealicense.com/licenses/apache-2.0/) licence.
 ## Citation
+```bibtex
+@article{geiping_latent_2025,
+  title = {Latent Chain-of-Thought? Decoding the Depth-Recurrent Transformer},
   author = {Geiping, Jonas and McLeish, Sean and Jain, Neel and Kirchenbauer, John and Singh, Siddharth and Bartoldson, Brian R. and Kailkhura, Bhavya and Bhatele, Abhinav and Goldstein, Tom},
   year = {2025},
+  month = jul,
+  eprint = {2507.02199},
   primaryclass = {cs},
   publisher = {arXiv},
+  doi = {10.48550/arXiv.2507.02199},
+  url = {http://arxiv.org/abs/2507.02199},
+  urldate = {2025-07-02},
   archiveprefix = {arXiv},
   keywords = {Computer Science - Computation and Language,Computer Science - Machine Learning},
+  journal = {arxiv:2507.02199[cs]}
 }
 ```