A newer version of this model is available:
HuggingFaceTB/SmolLM2-135M-Instruct
Marathi_SmolLM_135M
A compact LLaMA‑style Marathi language model (~134.5M parameters) finetuned for Marathi text generation and creative tasks (e.g., haiku). The model uses a native Marathi tokenizer (49,152 BPE vocab) and a 30‑layer, 9‑head decoder‑only transformer with GQA, RMSNorm, and RoPE.
- Base model:
HuggingFaceTB/SmolLM-135M - Language: Marathi (
mr) - License: Apache‑2.0
- Intended use: Marathi text generation (short form), experimentation, education
How to Use
Using transformers:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "skolvankar/Marathi_SmolLM_135M" # replace with your namespace if different
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32)
model.eval()
prompt = "आजचा दिवस"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Using the pipeline API:
from transformers import pipeline
pipe = pipeline("text-generation", model="skolvankar/Marathi_SmolLM_135M")
print(pipe("आजचा दिवस", max_new_tokens=50)[0]["generated_text"]) # adjust key per transformers version
Model Details
- Architecture: LLaMA‑style transformer decoder (causal LM)
- Layers: 30
- Hidden size: 576
- Attention heads: 9 (head_dim=64), KV heads: 3 (GQA)
- FFN dim: 1536 (SwiGLU)
- Vocab size: 49,152 (Marathi tokenizer)
- Positional encoding: RoPE (θ=10,000)
- Normalization: RMSNorm (ε=1e‑5)
- Tie embeddings: True
- Parameters: ~134,515,008
Training Data
- 8.2GB Marathi corpus (news, literature, public web text). The dataset was pre‑tokenized with a Marathi‑native BPE tokenizer and memory‑mapped for efficient shuffling.
- Data cleaning: UTF‑8 normalization, basic deduplication, and heuristic filtering.
Training Procedure
- Optimizer: AdamW
- Learning rate: 1e‑4 (warmup/decay tuned per run)
- Sequence length: 128 (experiments planned for 512/1024)
- Batch size: GPU‑dependent (A100 used for main runs)
- Checkpoints: saved every N steps and at final step
Resuming from 5001+
Training is demonstrably resumed from the final checkpoint (e.g., step 5000 → 5050). See repository logs and next_50.py for a minimal resume script that loads checkpoint_final.pt and runs exactly +50 steps, saving checkpoint_5050.pt.
Evaluation
- Primary metric: qualitative text quality; early runs tracked basic metrics (e.g., accuracy proxy on next‑token prediction). A more comprehensive Marathi‑specific evaluation suite is planned.
Limitations and Biases
- Short context (~128) in the published checkpoint; long‑context performance may be limited.
- The model may reflect biases present in the web‑scale Marathi data.
- Not suitable for critical decision‑making without human review.
Safety
- The model can generate incorrect, biased, or inappropriate content. Use with care.
- Add filtering and guardrails at application level as needed.
Intended Uses & Misuses
- Intended: research, education, creative Marathi text generation (short form), prototyping.
- Not intended: toxic content generation, disinformation, medical/legal advice.
Acknowledgements
- Base model:
HuggingFaceTB/SmolLM-135M - Thanks to the open‑source community for tokenizer, training, and evaluation tooling.
Citation
If you use this model, please cite the base model and this repository/model card.
@misc{Marathi_SmolLM_135M,
title = {Marathi SmolLM 135M},
author = {Shivranjan Kolvankar},
year = {2025},
url = {https://huggingface.co/skolvankar/Marathi_SmolLM_135M}
}
Model tree for skolvankar/Marathi_SmolLM_135M
Base model
HuggingFaceTB/SmolLM-135M