mindware commited on
Commit
5eb6a06
·
verified ·
1 Parent(s): cfed836

Update README for Codet5 small

Browse files
Files changed (1) hide show
  1. README.md +78 -0
README.md ADDED
@@ -0,0 +1,78 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ base_model:
6
+ - Salesforce/codet5-small
7
+ tags:
8
+ - ARC-AGI
9
+ - ARC
10
+ - code
11
+ datasets:
12
+ - mindware/arc-mega
13
+ - Open-Orca/SlimOrca
14
+ - camel-ai/math
15
+ - skeskinen/TinyStories-GPT4
16
+ - rajpurkar/squad_v2
17
+ - garage-bAInd/Open-Platypus
18
+ - Sharathhebbar24/arxiv-math-instruct-50k
19
+ - AlgorithmicResearchGroup/arxiv-physics-instruct-tune-30k
20
+ - TIGER-Lab/MathInstruct
21
+ - neoneye/histogram-comparisons-small-v1
22
+ - ise-uiuc/Magicoder-Evol-Instruct-110K
23
+ - PrimeIntellect/INTELLECT-MATH-SFT-Data
24
+ - PrimeIntellect/verifiable-math-problems
25
+ - sethapun/arithmetic_2md_1to1000
26
+ - EleutherAI/proof-pile-2
27
+ - MMInstruction/M3IT
28
+ - stingning/ultrachat
29
+ - timdettmers/openassistant-guanaco
30
+ - Dahoas/instruct-synthetic-prompt-responses
31
+ - pankajmathur/WizardLM_Orca
32
+ ---
33
+
34
+ This checkpoint is the compact CodeT5-based solver we shipped with the MindsAI @ Tufa Labs ARC Prize 2025 entry. It was trained with the identical curriculum and TPU-v4 setup as [`mindware/arc-codet5-660m`](https://huggingface.co/mindware/arc-codet5-660m), but starts from the `Salesforce/codet5-small` initialization so it can be deployed on modest GPUs while still supporting our Test-Time-Finetuning (TTT) workflow.
35
+
36
+ Key details:
37
+
38
+ - **Shared training recipe** – same long-horizon span-corruption + instruction + ARC fine-tuning schedule we used for the 660M model; only the base capacity differs.
39
+ - **Decoder-preserving pruning** – we retain the full encoder but prune the decoder depth after observing that shallow decoders converge faster for smaller checkpoints.
40
+ - **TTT + AIRV ready** – configuration matches the refactored runtime (self-ensemble, AIRV voting, optional span-corruption refinement hooks) so you can drop it directly into `MODEL_PATHS`.
41
+
42
+ 📚 **ARC-Related Datasets & Frameworks**
43
+ - [RE-ARC](https://github.com/michaelhodel/re-arc) — procedurally generates examples for the 400 ARC training tasks (we also include RE-ARC eval + ARC 1.5).
44
+ - [ConceptARC](https://github.com/victorvikram/ConceptARC)
45
+ - [1D-ARC](https://khalil-research.github.io/LLM4ARC/)
46
+ - ARC_gym, Sort-of-ARC
47
+ - Andreas Koepf’s generator suites (includes RE-ARC-style grids, code generation targets, and solution graphs).
48
+ - Jack Cole’s custom generators covering ~70 tasks plus larger concept sets (cellular automata, math-derived boards, etc.).
49
+
50
+ Several auxiliary datasets predict task metadata (graphs, heuristics, explanations) rather than final boards; they are part of the broader instruction mixture this model saw during pretraining.
51
+
52
+ ## ARC Data Formatting
53
+
54
+ - ARC tasks ship as JSON where each `task_id` contains `train` pairs and `test` inputs; grids are rectangular lists of ints `0-9`, typically 1×1–30×30 but we accept up to 50×50.
55
+ - Example payload:
56
+ ```json
57
+ {
58
+ "task_id": {
59
+ "train": [
60
+ {"input": [[0,0],[1,1]], "output": [[1,1],[1,1]]}
61
+ ],
62
+ "test": [
63
+ {"input": [[0,0,0],[0,1,0],[0,0,0]]}
64
+ ]
65
+ }
66
+ }
67
+ ```
68
+ - Prompts serialized during training/TTT/inference follow the same `solve:` template as our other releases. Grids are flattened via `grid_to_string` so rows become digit sequences separated by spaces. Multiple training examples increment the index (`input2`, `output2`, ...).
69
+ - Example prompt snippet:
70
+ ```text
71
+ solve: train input1 000 010 000 output1 11 3 3 10 111 101 111. input2 00 02 output2 5 2 2 20 22 20. test tinput1 0000 0300 0000 0000 toutput1
72
+ ```
73
+ - Decoder targets (`correct_answer`) use `output_prefix` -> ` {total_chars} {height} {width} {symbols} {row_strings}.` Example donut target:
74
+ ```text
75
+ 11 3 3 10 111 101 111.
76
+ ```
77
+
78
+ The weights here are best suited for development, ablations, and Kaggle submissions where GPU memory is constrained. For the strongest results, pair this checkpoint with the larger 660M or SCR variants in an ensemble, but the small model alone remains a strong baseline.