DumbLLM

"This AI was literally made on my knees in a day. It is extremely weak."

This project is an experiment in training a language model from scratch. It was built using PyTorch and the Hugging Face transformers library purely for educational purposes.

The model was trained on only ~117-150 million tokens, which was just enough for a test run to see if the pipeline works. As the name suggests, this model is mega dumb.

On the plus side, it's probably the most compact LLM that actually works in some capacity. The main idea was to create a proof-of-concept, not a useful tool.

Key points:

Trained from scratch: Not a fine-tuned model.
Minimalistic Training Data: Learned from a tiny dataset, so its knowledge is extremely limited.
Honest Naming: It's called "DumbLLM" for a reason. Set your expectations low.
Educational: Its only purpose is to demonstrate the training pipeline.

WARNING: Do not use this model for any real-world tasks. It will provide nonsensical and incorrect information.

Downloads last month: 3

Model tree for kurumikz/DumbLLM-12M

Unable to build the model tree, the base model loops to the model itself. Learn more.