A newer version of this model is available: HuggingFaceTB/SmolLM2-135M-Instruct

Marathi_SmolLM_135M

A compact LLaMA‑style Marathi language model (~134.5M parameters) finetuned for Marathi text generation and creative tasks (e.g., haiku). The model uses a native Marathi tokenizer (49,152 BPE vocab) and a 30‑layer, 9‑head decoder‑only transformer with GQA, RMSNorm, and RoPE.

Base model: HuggingFaceTB/SmolLM-135M
Language: Marathi (mr)
License: Apache‑2.0
Intended use: Marathi text generation (short form), experimentation, education

How to Use

Using transformers:

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "skolvankar/Marathi_SmolLM_135M"  # replace with your namespace if different

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32)
model.eval()

prompt = "आजचा दिवस"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Using the pipeline API:

from transformers import pipeline

pipe = pipeline("text-generation", model="skolvankar/Marathi_SmolLM_135M")
print(pipe("आजचा दिवस", max_new_tokens=50)[0]["generated_text"])  # adjust key per transformers version

Model Details

Architecture: LLaMA‑style transformer decoder (causal LM)
Layers: 30
Hidden size: 576
Attention heads: 9 (head_dim=64), KV heads: 3 (GQA)
FFN dim: 1536 (SwiGLU)
Vocab size: 49,152 (Marathi tokenizer)
Positional encoding: RoPE (θ=10,000)
Normalization: RMSNorm (ε=1e‑5)
Tie embeddings: True
Parameters: ~134,515,008

Training Data

8.2GB Marathi corpus (news, literature, public web text). The dataset was pre‑tokenized with a Marathi‑native BPE tokenizer and memory‑mapped for efficient shuffling.
Data cleaning: UTF‑8 normalization, basic deduplication, and heuristic filtering.

Training Procedure

Optimizer: AdamW
Learning rate: 1e‑4 (warmup/decay tuned per run)
Sequence length: 128 (experiments planned for 512/1024)
Batch size: GPU‑dependent (A100 used for main runs)
Checkpoints: saved every N steps and at final step

Resuming from 5001+

Training is demonstrably resumed from the final checkpoint (e.g., step 5000 → 5050). See repository logs and next_50.py for a minimal resume script that loads checkpoint_final.pt and runs exactly +50 steps, saving checkpoint_5050.pt.

Evaluation

Primary metric: qualitative text quality; early runs tracked basic metrics (e.g., accuracy proxy on next‑token prediction). A more comprehensive Marathi‑specific evaluation suite is planned.

Limitations and Biases

Short context (~128) in the published checkpoint; long‑context performance may be limited.
The model may reflect biases present in the web‑scale Marathi data.
Not suitable for critical decision‑making without human review.

Safety

The model can generate incorrect, biased, or inappropriate content. Use with care.
Add filtering and guardrails at application level as needed.

Intended Uses & Misuses

Intended: research, education, creative Marathi text generation (short form), prototyping.
Not intended: toxic content generation, disinformation, medical/legal advice.

Acknowledgements

Base model: HuggingFaceTB/SmolLM-135M
Thanks to the open‑source community for tokenizer, training, and evaluation tooling.

Citation

If you use this model, please cite the base model and this repository/model card.

@misc{Marathi_SmolLM_135M,
  title  = {Marathi SmolLM 135M},
  author = {Shivranjan Kolvankar},
  year   = {2025},
  url    = {https://huggingface.co/skolvankar/Marathi_SmolLM_135M}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for skolvankar/Marathi_SmolLM_135M

Base model

HuggingFaceTB/SmolLM-135M

Finetuned

(78)

this model

skolvankar
/

Marathi_SmolLM_135M