--- license: mit tags: - text-classification - cheese - texture - distilbert - transformers - fine-tuned datasets: - aslan-ng/cheese-text metrics: - accuracy model-index: - name: Cheese Texture Classifier (DistilBERT) results: - task: type: text-classification name: Cheese Texture Classification dataset: type: aslan-ng/cheese-text name: Cheese Text Dataset metrics: - type: accuracy value: 0.400 name: Test Accuracy --- # Cheese Texture Classifier (DistilBERT) **Model Creator**: Rumi Loghmani (@rlogh) **Original Dataset**: aslan-ng/cheese-text (by Aslan Noorghasemi) This model performs 4-class texture classification on cheese descriptions using fine-tuned DistilBERT. ## Model Description - **Architecture**: DistilBERT-base-uncased fine-tuned for sequence classification - **Task**: 4-class texture classification (hard, semi-hard, semi-soft, soft) - **Input**: Cheese description text (up to 512 tokens) - **Output**: 4-class probability distribution ## Training Details ### Data - **Dataset**: [aslan-ng/cheese-text](https://huggingface.co/datasets/aslan-ng/cheese-text) (original split: 100 samples) - **Train/Val/Test Split**: 70/15/15 (stratified) - **Text Source**: Cheese descriptions from the dataset - **Labels**: Texture categories (hard, semi-hard, semi-soft, soft) ### Preprocessing - **Tokenization**: DistilBERT tokenizer with 512 max length - **Padding**: Max length padding - **Truncation**: Long descriptions truncated to 512 tokens ### Training Setup - **Model**: distilbert-base-uncased - **Epochs**: 10 - **Batch Size**: 8 (train/val) - **Learning Rate**: 2e-5 - **Warmup Steps**: 10 - **Weight Decay**: 0.01 - **Optimizer**: AdamW - **Scheduler**: Linear warmup + linear decay - **Mixed Precision**: FP16 (if GPU available) - **Seed**: 42 (for reproducibility) ### Hardware/Compute - **Training Device**: CPU - **Training Time**: ~5-10 minutes on GPU - **Model Size**: ~67M parameters - **Memory Usage**: ~2-4GB GPU memory ## Performance - **Test Accuracy**: 0.400 - **Test Loss**: 1.290 ### Class-wise Performance precision recall f1-score support hard 0.50 0.33 0.40 3 semi-hard 0.29 0.50 0.36 4 semi-soft 0.40 0.50 0.44 4 soft 1.00 0.25 0.40 4 accuracy 0.40 15 macro avg 0.55 0.40 0.40 15 weighted avg 0.55 0.40 0.40 15 ## Usage ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch # Load model and tokenizer model_name = "rlogh/cheese-texture-classifier-distilbert" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name) # Example prediction text = "Feta is a crumbly, tangy Greek cheese with a salty bite and creamy undertones." inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512) with torch.no_grad(): outputs = model(**inputs) predictions = torch.nn.functional.softmax(outputs.logits, dim=-1) predicted_class = torch.argmax(predictions, dim=-1).item() class_names = ["hard", "semi-hard", "semi-soft", "soft"] print(f"Predicted texture: {class_names[predicted_class]}") ``` ## Class Definitions - **Hard**: Firm, aged cheeses that are dense and can be grated (e.g., Parmesan, Cheddar) - **Semi-hard**: Moderately firm cheeses with some flexibility (e.g., Gouda, Swiss) - **Semi-soft**: Cheeses with some give but maintain shape (e.g., Mozzarella, Blue cheese) - **Soft**: Creamy, spreadable cheeses (e.g., Brie, Camembert, Cottage cheese) ## Limitations and Ethics ### Limitations - **Small Dataset**: Trained on only 100 samples, limiting generalization - **Text Quality**: Performance depends on description quality and consistency - **Subjective Labels**: Texture classification has inherent subjectivity - **Domain Specific**: Only applicable to cheese texture classification - **Language**: English-only model ### Ethical Considerations - **Bias**: Model may reflect biases in the original dataset - **Cultural Context**: Cheese descriptions may be culturally specific - **Commercial Use**: Not intended for commercial cheese production decisions - **Accuracy**: Should not be used for critical food safety applications ### Recommendations - Use for educational/research purposes only - Validate predictions with domain experts - Consider cultural context when applying to different regions - Retrain with larger, more diverse datasets for production use ## AI Usage Disclosure This model was developed using: - **Base Model**: DistilBERT (distilbert-base-uncased) - **Training Framework**: Hugging Face Transformers - **Fine-tuning**: Standard BERT fine-tuning techniques - The AI acted as a collaborative partner throughout the development process, accelerating the coding workflow and providing helpful guidance. ## Citation **Model Citation:** ```bibtex @model{rlogh/cheese-texture-classifier-distilbert, title={Cheese Texture Classifier (DistilBERT)}, author={Rumi Loghmani}, year={2024}, url={https://huggingface.co/rlogh/cheese-texture-classifier-distilbert} } ``` **Dataset Citation:** ```bibtex @dataset{aslan-ng/cheese-text, title={Cheese Text Dataset}, author={Aslan Noorghasemi}, year={2024}, url={https://huggingface.co/datasets/aslan-ng/cheese-text} } ``` ## License MIT License - See LICENSE file for details.