Model Description

grojeda/distilbert-sentiment-imdb fine-tunes distilbert-base-uncased for binary sentiment classification over the IMDB Large Movie Review dataset. Reviews are tokenized with the base WordPiece tokenizer, truncated to 384 tokens, and dynamically padded via DataCollatorWithPadding. Training uses Hugging Face Transformers (PyTorch) and produces weights that inherit the Apache 2.0 terms of the base checkpoint and dataset license constraints.

Intended Uses & Limitations

Recommended for English sentiment analysis of movie-style product reviews, benchmarking compact BERT-family encoders, and serving as a starting point for domain adaptation.
Not suitable for multilingual sentiment, sarcasm detection, fine-grained emotion tagging, or high-stakes moderation without human review.
Known limitations include potential degradation on inputs longer than 384 tokens, sensitivity to domain shifts (legal/medical jargon), and lack of calibration for imbalanced datasets.
Ethical considerations: the model may reproduce societal biases present in IMDB reviews; avoid using outputs for demographic inference or automated enforcement without auditing.

Training Details

Dataset: imdb from Hugging Face Datasets. The 25k training split was partitioned 90/10 (seed 42) into train (≈22.5k) and validation (≈2.5k). The official 25k test split remained untouched until evaluation.
Hyperparameters: learning rate 2e-5 (default scheduler), weight decay 0.01, AdamW optimizer, max length 384, per-device batch sizes 16 (train) / 32 (eval), no gradient accumulation, dropout per DistilBERT defaults.
Schedule: 2 epochs (~2,814 optimizer steps). Evaluation and checkpointing occur at each epoch with best-checkpoint reloading enabled.
Compute: Trained on a single consumer GPU (RTX 3050 4GB). Environment: Transformers 4.57.3 and PyTorch 2.9.1.

Evaluation Results

{
  "task": {
    "type": "text-classification",
    "name": "Sentiment Analysis"
  },
  "dataset": {
    "name": "imdb",
    "type": "imdb",
    "split": "test"
  },
  "metrics": [
    {
      "type": "accuracy",
      "name": "Accuracy",
      "value": 0.9256
    },
    {
      "type": "loss",
      "name": "CrossEntropyLoss",
      "value": 0.2435
    }
  ]
}

How to Use

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_id = "grojeda/distilbert-sentiment-imdb"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)

text = "The pacing drags, but the performances are heartfelt."
inputs = tokenizer(text, return_tensors="pt")
with torch.no_grad():
    probs = model(**inputs).logits.softmax(dim=-1)

label_id = int(probs.argmax())
print(model.config.id2label[label_id], float(probs[0, label_id]))

Limitations & Biases

Domain bias toward movie reviews; expect weaker performance on other domains without fine-tuning.
Only English data was used; multilingual inputs are unsupported.
Dataset balance (50/50) can lead to overconfidence when deployed on skewed class distributions.
User-generated content may embed stereotypes or offensive language that the model can echo.

Citation

@article{sanh2019distilbert,
  title   = {DistilBERT, a Distilled Version of BERT: Smaller, Faster, Cheaper and Lighter},
  author  = {Victor Sanh and Lysandre Debut and Julien Chaumond and Thomas Wolf},
  journal = {NeurIPS EMC2 Workshop},
  year    = {2019}
}

@inproceedings{maas2011learning,
  title     = {Learning Word Vectors for Sentiment Analysis},
  author    = {Andrew L. Maas and Raymond E. Daly and Peter T. Pham and Dan Huang and Andrew Y. Ng and Christopher Potts},
  booktitle = {Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies},
  year      = {2011}
}

Downloads last month: 62

Safetensors

Model size

67M params

Tensor type

F32

Model tree for grojeda/distilbert-sentiment-imdb

Base model

distilbert/distilbert-base-uncased

Finetuned

(10563)

this model

grojeda
/

distilbert-sentiment-imdb