EAI6010 β Week 5: Microservice (Text Classification from Week 3)
Author: Steven
Class: EAI6010
Institution: Northeastern University
Model Type: TF-IDF + Logistic Regression
Hosting Environment: Hugging Face Spaces (Docker-based FastAPI microservice)
Model Overview
This model performs text classification using a TF-IDF vectorizer combined with a Logistic Regression classifier. It was developed as part of the EAI6010 Week 5 assignment to demonstrate how a machine learning model can be deployed as a production-ready microservice using FastAPI and Docker on Hugging Face Spaces.
The model classifies input text into three sentiment categories:
- positive
- negative
- neutral
It builds upon the learning objectives from Week 3 (Natural Language Processing), focusing on traditional NLP modeling pipelines using scikit-learn.
Training Data
The model was trained on a small custom dataset containing 15 labeled examples of user feedback text in the following format:
| Text | Label |
|---|---|
| "I love this product, it works perfectly." | positive |
| "Terrible service, I'm very disappointed." | negative |
| "This is okay, not great but acceptable." | neutral |
Dataset file: data/fallback_text_data.csv
You can replace it with a custom dataset (text_data.csv) with the same schema:
columns: text, label.
Model Details
Architecture:
TfidfVectorizer(bigrams, max_features=20,000)LogisticRegression(max_iter=1000)
Pipeline:
Pipeline([
('tfidf', TfidfVectorizer(ngram_range=(1,2), max_features=20000)),
('clf', LogisticRegression(max_iter=1000))
])