YALM-80M

YALM (Yet Another Language Model) is a family of an experimental small language models developed through my ongoing exploration of language modeling and LLM architectures.

YALM-80M is the first member model in this family. This model is trained on a diverse corpus of English, Hindi, Math, and Python Code to test its capacity for multi-lingual and technical reasoning.

Note: There is a bug in tokenizer which may cause error during generation for certrain inputs.

Model Overview:

Architecture: Llama
Pretraining steps: 34k
Pretraining tokens: 36B
Precision: bfloat16
Number of Parameters: 79.7M
Number of Paramaters (Non-Embedding): 62.9M
Number of Layers: 16
Number of Attention Heads (GQA): 8 for Q and 4 for KV
Context Length: 2048

Usage

>>> from transformers import AutoTokenizer, AutoModelForCausalLM
>>> tokenizer = AutoTokenizer.from_pretrained("kp7742/YALM-80M")
>>> model = AutoModelForCausalLM.from_pretrained("kp7742/YALM-80M")
>>> inputs = tokenizer("Hey how are you doing?", return_tensors="pt")
>>> out = model.generate(**inputs, max_new_tokens=100)
>>> print(tokenizer.batch_decode(out))

Training

Data

This model is pre-trained on YALM-pretrain5-60M

Hyperparameters

learning_rate: 0.007812
train_batch_size: 16
eval_batch_size: 16
distributed_type: multi-GPU DDP
num_devices: 8
gradient_accumulation_steps: 4
total_train_batch_size: 512
total_eval_batch_size: 128
optimizer: AdamW with betas=(0.9, 0.95) and epsilon=1e-08
lr_scheduler_type: warmup_stable_decay
lr_scheduler_warmup_steps: 3400
training_steps: 34000

Hardware

GPUs: 8 x RTX 4090

Framework versions

Transformers 4.53.1
Pytorch 2.7.1+cu128
Datasets 3.6.0
Tokenizers 0.21.2

Evaluation

All evaluations are zero-shot unless stated otherwise, and I used lighteval to run them.

It achieves the following results on the test set:

Loss: 2.78
Perplexity: 16.10

Base pre-trained model

Metrics	YALM-80M
MMLU (cloze)	27.33
MMLU Pro	8.72
BBH (5-shot)	12.61
ARC (Average)	29.87
HellaSwag	32.16
PIQA	62.89
SCIQ	69.50
CommonsenseQA	28.75
Winogrande	50.59
OpenBookQA	29.60
TruthfulQA	22.78
TriviaQA	0.17
GSM8K (5-shot)	0.83

Limitations

YALM models primarily understand and generate content in English and Hindi. They can produce text on a variety of topics but as world knowledge is limited, the generated content may not always be factually accurate, logically consistent, or free from biases present in the training data.

Downloads last month: 4

Safetensors

Model size

79.7M params

Tensor type

F32

Model tree for kp7742/YALM-80M

Quantizations

1 model

kp7742
/

YALM-80M