mindware
/

arc-codet5-small

+---
+license: mit
+language:
+- en
+base_model:
+- Salesforce/codet5-small
+tags:
+- ARC-AGI
+- ARC
+- code
+datasets:
+- mindware/arc-mega
+- Open-Orca/SlimOrca
+- camel-ai/math
+- skeskinen/TinyStories-GPT4
+- rajpurkar/squad_v2
+- garage-bAInd/Open-Platypus
+- Sharathhebbar24/arxiv-math-instruct-50k
+- AlgorithmicResearchGroup/arxiv-physics-instruct-tune-30k
+- TIGER-Lab/MathInstruct
+- neoneye/histogram-comparisons-small-v1
+- ise-uiuc/Magicoder-Evol-Instruct-110K
+- PrimeIntellect/INTELLECT-MATH-SFT-Data
+- PrimeIntellect/verifiable-math-problems
+- sethapun/arithmetic_2md_1to1000
+- EleutherAI/proof-pile-2
+- MMInstruction/M3IT
+- stingning/ultrachat
+- timdettmers/openassistant-guanaco
+- Dahoas/instruct-synthetic-prompt-responses
+- pankajmathur/WizardLM_Orca
+---
+This checkpoint is the compact CodeT5-based solver we shipped with the MindsAI @ Tufa Labs ARC Prize 2025 entry. It was trained with the identical curriculum and TPU-v4 setup as [`mindware/arc-codet5-660m`](https://huggingface.co/mindware/arc-codet5-660m), but starts from the `Salesforce/codet5-small` initialization so it can be deployed on modest GPUs while still supporting our Test-Time-Finetuning (TTT) workflow.
+Key details:
+- **Shared training recipe** – same long-horizon span-corruption + instruction + ARC fine-tuning schedule we used for the 660M model; only the base capacity differs.
+- **Decoder-preserving pruning** – we retain the full encoder but prune the decoder depth after observing that shallow decoders converge faster for smaller checkpoints.
+- **TTT + AIRV ready** – configuration matches the refactored runtime (self-ensemble, AIRV voting, optional span-corruption refinement hooks) so you can drop it directly into `MODEL_PATHS`.
+📚 **ARC-Related Datasets & Frameworks**
+- [RE-ARC](https://github.com/michaelhodel/re-arc) — procedurally generates examples for the 400 ARC training tasks (we also include RE-ARC eval + ARC 1.5).
+- [ConceptARC](https://github.com/victorvikram/ConceptARC)
+- [1D-ARC](https://khalil-research.github.io/LLM4ARC/)
+- ARC_gym, Sort-of-ARC
+- Andreas Koepf’s generator suites (includes RE-ARC-style grids, code generation targets, and solution graphs).
+- Jack Cole’s custom generators covering ~70 tasks plus larger concept sets (cellular automata, math-derived boards, etc.).
+Several auxiliary datasets predict task metadata (graphs, heuristics, explanations) rather than final boards; they are part of the broader instruction mixture this model saw during pretraining.
+## ARC Data Formatting
+- ARC tasks ship as JSON where each `task_id` contains `train` pairs and `test` inputs; grids are rectangular lists of ints `0-9`, typically 1×1–30×30 but we accept up to 50×50.
+- Example payload:
+  ```json
+  {
+    "task_id": {
+      "train": [
+        {"input": [[0,0],[1,1]], "output": [[1,1],[1,1]]}
+      ],
+      "test": [
+        {"input": [[0,0,0],[0,1,0],[0,0,0]]}
+      ]
+    }
+  }
+  ```
+- Prompts serialized during training/TTT/inference follow the same `solve:` template as our other releases. Grids are flattened via `grid_to_string` so rows become digit sequences separated by spaces. Multiple training examples increment the index (`input2`, `output2`, ...).
+- Example prompt snippet:
+  ```text
+  solve: train input1 000 010 000 output1 11 3 3 10 111 101 111. input2 00 02 output2 5 2 2 20 22 20. test tinput1 0000 0300 0000 0000 toutput1
+  ```
+- Decoder targets (`correct_answer`) use `output_prefix` -> ` {total_chars} {height} {width} {symbols} {row_strings}.` Example donut target:
+  ```text
+   11 3 3 10 111 101 111.
+  ```
+The weights here are best suited for development, ablations, and Kaggle submissions where GPU memory is constrained. For the strongest results, pair this checkpoint with the larger 660M or SCR variants in an ensemble, but the small model alone remains a strong baseline.