A newer version of this model is available: HuggingFaceTB/SmolLM2-135M-Instruct

Marathi_SmolLM_135M

A compact LLaMA‑style Marathi language model (~134.5M parameters) finetuned for Marathi text generation and creative tasks (e.g., haiku). The model uses a native Marathi tokenizer (49,152 BPE vocab) and a 30‑layer, 9‑head decoder‑only transformer with GQA, RMSNorm, and RoPE.

  • Base model: HuggingFaceTB/SmolLM-135M
  • Language: Marathi (mr)
  • License: Apache‑2.0
  • Intended use: Marathi text generation (short form), experimentation, education

How to Use

Using transformers:

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "skolvankar/Marathi_SmolLM_135M"  # replace with your namespace if different

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32)
model.eval()

prompt = "आजचा दिवस"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Using the pipeline API:

from transformers import pipeline

pipe = pipeline("text-generation", model="skolvankar/Marathi_SmolLM_135M")
print(pipe("आजचा दिवस", max_new_tokens=50)[0]["generated_text"])  # adjust key per transformers version

Model Details

  • Architecture: LLaMA‑style transformer decoder (causal LM)
  • Layers: 30
  • Hidden size: 576
  • Attention heads: 9 (head_dim=64), KV heads: 3 (GQA)
  • FFN dim: 1536 (SwiGLU)
  • Vocab size: 49,152 (Marathi tokenizer)
  • Positional encoding: RoPE (θ=10,000)
  • Normalization: RMSNorm (ε=1e‑5)
  • Tie embeddings: True
  • Parameters: ~134,515,008

Training Data

  • 8.2GB Marathi corpus (news, literature, public web text). The dataset was pre‑tokenized with a Marathi‑native BPE tokenizer and memory‑mapped for efficient shuffling.
  • Data cleaning: UTF‑8 normalization, basic deduplication, and heuristic filtering.

Training Procedure

  • Optimizer: AdamW
  • Learning rate: 1e‑4 (warmup/decay tuned per run)
  • Sequence length: 128 (experiments planned for 512/1024)
  • Batch size: GPU‑dependent (A100 used for main runs)
  • Checkpoints: saved every N steps and at final step

Resuming from 5001+

Training is demonstrably resumed from the final checkpoint (e.g., step 5000 → 5050). See repository logs and next_50.py for a minimal resume script that loads checkpoint_final.pt and runs exactly +50 steps, saving checkpoint_5050.pt.

Evaluation

  • Primary metric: qualitative text quality; early runs tracked basic metrics (e.g., accuracy proxy on next‑token prediction). A more comprehensive Marathi‑specific evaluation suite is planned.

Limitations and Biases

  • Short context (~128) in the published checkpoint; long‑context performance may be limited.
  • The model may reflect biases present in the web‑scale Marathi data.
  • Not suitable for critical decision‑making without human review.

Safety

  • The model can generate incorrect, biased, or inappropriate content. Use with care.
  • Add filtering and guardrails at application level as needed.

Intended Uses & Misuses

  • Intended: research, education, creative Marathi text generation (short form), prototyping.
  • Not intended: toxic content generation, disinformation, medical/legal advice.

Acknowledgements

  • Base model: HuggingFaceTB/SmolLM-135M
  • Thanks to the open‑source community for tokenizer, training, and evaluation tooling.

Citation

If you use this model, please cite the base model and this repository/model card.

@misc{Marathi_SmolLM_135M,
  title  = {Marathi SmolLM 135M},
  author = {Shivranjan Kolvankar},
  year   = {2025},
  url    = {https://huggingface.co/skolvankar/Marathi_SmolLM_135M}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for skolvankar/Marathi_SmolLM_135M

Finetuned
(78)
this model

Space using skolvankar/Marathi_SmolLM_135M 1