Mistral-7b with continued pretraining using Quiet-STaR (https://arxiv.org/abs/2403.09629) for generating 8 thought tokens before each output token.

Downloads last month
10
Safetensors
Model size
7B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for ezelikman/quietstar-8-ahead

Merges
3 models
Quantizations
1 model

Dataset used to train ezelikman/quietstar-8-ahead

Spaces using ezelikman/quietstar-8-ahead 9