ctm-experiments / README.md

Upload README.md with huggingface_hub

7f14743 verified 21 days ago

4.42 kB

	# CTM Experiments - Continuous Thought Machine Models

	Experimental checkpoints trained on the [Continuous Thought Machine](https://github.com/SakanaAI/continuous-thought-machines) architecture by Sakana AI.

	These are community experiments on the original work - not official SakanaAI models.

	## Paper Reference

	> Continuous Thought Machines
	>
	> Sakana AI
	>
	> [arXiv:2505.05522](https://arxiv.org/abs/2505.05522)
	>
	> [Interactive Demo](https://pub.sakana.ai/ctm/) \| [Blog Post](https://sakana.ai/ctm/)

	```bibtex
	@article{sakana2025ctm,
	title={Continuous Thought Machines},
	author={Sakana AI},
	journal={arXiv preprint arXiv:2505.05522},
	year={2025}
	}
	```

	## Core Insight

	CTM's key innovation: accuracy improves with more internal iterations. The model "thinks longer" to reach better answers. This enables CTM to learn algorithmic reasoning that feedforward networks struggle with.

	## Models

	\| Model \| File \| Size \| Task \| Accuracy \| Description \|
	\|-------\|------\|------\|------\|----------\|-------------\|
	\| MNIST \| `ctm-mnist.pt` \| 1.3M \| Digit classification \| 97.9% \| 10-class MNIST \|
	\| Parity-16 \| `ctm-parity-16.pt` \| 2.5M \| Cumulative parity \| 99.0% \| 16-bit sequences \|
	\| Parity-64 \| `ctm-parity-64.pt` \| 66M \| Cumulative parity \| 75% \| 64-bit sequences \|
	\| QAMNIST \| `ctm-qamnist.pt` \| 39M \| Multi-step arithmetic \| 100% \| 3-5 digits, 3-5 ops \|
	\| Brackets \| `ctm-brackets.pt` \| 6.1M \| Bracket matching \| 94.7% \| Valid/invalid `(()[])` \|
	\| Tracking-Quadrant \| `ctm-tracking-quadrant.pt` \| 6.7M \| Motion quadrant \| 100% \| 4-class prediction \|
	\| Tracking-Position \| `ctm-tracking-position.pt` \| 6.7M \| Exact position \| 93.8% \| 256-class (16x16 grid) \|
	\| Transfer \| `ctm-transfer-parity-brackets.pt` \| 2.5M \| Transfer learning \| 94.5% \| Parity core to brackets \|

	## Model Configurations

	### MNIST CTM
	```python
	config = {
	"iterations": 15,
	"memory_length": 10,
	"d_model": 128,
	"d_input": 128,
	"heads": 2,
	"n_synch_out": 16,
	"n_synch_action": 16,
	"memory_hidden_dims": 8,
	"out_dims": 10,
	"synapse_depth": 1,
	}
	```

	### Parity-16 CTM
	```python
	config = {
	"iterations": 50,
	"memory_length": 25,
	"d_model": 256,
	"d_input": 32,
	"heads": 8,
	"synapse_depth": 8,
	"out_dims": 16, # cumulative parity
	}
	```

	### QAMNIST CTM
	```python
	config = {
	"iterations": 10,
	"memory_length": 30,
	"d_model": 1024,
	"d_input": 64,
	"synapse_depth": 1,
	"heads": 4,
	"n_synch_out": 32,
	"n_synch_action": 32,
	}
	```

	### Brackets CTM
	```python
	config = {
	"iterations": 30,
	"memory_length": 15,
	"d_model": 256,
	"d_input": 64,
	"heads": 4,
	"n_synch_out": 32,
	"n_synch_action": 32,
	"out_dims": 2, # valid/invalid
	}
	```

	### Tracking CTM
	```python
	config = {
	"iterations": 20,
	"memory_length": 15,
	"d_model": 256,
	"d_input": 64,
	"heads": 4,
	"n_synch_out": 32,
	"n_synch_action": 32,
	}
	```

	## Usage

	```python
	import torch
	from huggingface_hub import hf_hub_download

	# Download model
	model_path = hf_hub_download(
	repo_id="vincentoh/ctm-experiments",
	filename="ctm-mnist.pt"
	)

	# Load checkpoint
	checkpoint = torch.load(model_path, map_location="cpu")

	# Initialize CTM with matching config
	from models.ctm import ContinuousThoughtMachine

	model = ContinuousThoughtMachine(**config)
	model.load_state_dict(checkpoint['model_state_dict'])
	model.eval()

	# Inference
	with torch.no_grad():
	output = model(input_tensor)
	```

	## Training Details

	- Hardware: NVIDIA RTX 4070 Ti SUPER
	- Framework: PyTorch
	- Optimizer: AdamW
	- Training time: 5 minutes (MNIST) to 17 hours (QAMNIST)

	## Key Findings

	1. Architecture > Scale: Small sync dimensions (32) with linear synapses work better than large/deep variants
	2. "Thinking Longer" = Higher Accuracy: CTM accuracy improves with more internal iterations
	3. Transfer Learning Works: Parity-trained core transfers to brackets with 94.5% accuracy

	## License

	MIT License (same as original CTM repository)

	## Acknowledgments

	- [Sakana AI](https://sakana.ai/) for the Continuous Thought Machine architecture
	- Original [CTM Repository](https://github.com/SakanaAI/continuous-thought-machines)

	## Links

	- [Experiment Repository](https://github.com/bigsnarfdude/ctm-experiments)
	- [Original Paper](https://arxiv.org/abs/2505.05522)
	- [Interactive Demo](https://pub.sakana.ai/ctm/)