SetFit with microsoft/Multilingual-MiniLM-L12-H384
This is a SetFit model that can be used for Text Classification. This SetFit model uses microsoft/Multilingual-MiniLM-L12-H384 as the Sentence Transformer embedding model. A SetFitHead instance is used for classification.
The model has been trained using an efficient few-shot learning technique that involves:
- Fine-tuning a Sentence Transformer with contrastive learning.
- Training a classification head with features from the fine-tuned Sentence Transformer.
Model Details
Model Description
Model Sources
Model Labels
| Label |
Examples |
| 0 |
- 'Madam divyaக்கு 1கிலோ colgate paste வாங்கி கொடுங்க videoவில் வாய் நாற்றம் தாங்கல'
- 'ഇനി ഇതുപോലുള്ള സാദനം ആയി വന്നാൽ ഞാൻ ഡിസ്ക്രൈബ് ചെയ്യും'
- 'ஏன்பா behindwoods தயவு செய்து இப்படி கேவலமான programme ஐ telecast பண்ணாதீங்க ராஜா'
|
| 1 |
- 'கம்பிய பழுக்க வச்சு சூத்துல வைங்க சார்'
- 'ഇനി റെഡ് സ്ട്രീറ്റ്റിലും കൂടി പോയി ഇന്റർവ്യൂ എടുക്ക് ചേച്ചി'
- 'നിങ്ങൾ പണ്ടേ വിവരക്കേടാണ്. ബോധം ഇല്ലായ്മ കാണിക്കാതെ സ്ത്രീ. മറ്റുള്ളവരുടെ കിഡ്ണി കളയിപ്പിച്ചിട്ടുവേണോ നിന്റെ കഞ്ഞി കുടിക്കൽ.'
|
Evaluation
Metrics
| Label |
Accuracy |
| all |
0.6875 |
Uses
Direct Use for Inference
First install the SetFit library:
pip install setfit
Then you can load this model and run inference.
from setfit import SetFitModel
model = SetFitModel.from_pretrained("livinNector/m-minilm-l12-h384-dra-tam-mal-aw-setfit-finetune")
preds = model("\"ഒരുപാട് ഇഷ്ട്ടപെട്ട പോലെ ഒരുപാട് വെറുത്ത് പോയി, ഡോക്ടറെ കിട്ടാനുള്ള ഭാഗ്യം ഇല്ല\"")
Training Details
Training Set Metrics
| Training set |
Min |
Median |
Max |
| Word count |
2 |
15.4375 |
123 |
| Label |
Training Sample Count |
| 0 |
132 |
| 1 |
124 |
Training Hyperparameters
- batch_size: (64, 64)
- num_epochs: (10, 10)
- max_steps: -1
- sampling_strategy: oversampling
- num_iterations: 2
- body_learning_rate: (2e-05, 1e-05)
- head_learning_rate: 0.01
- loss: CosineSimilarityLoss
- distance_metric: cosine_distance
- margin: 0.25
- end_to_end: True
- use_amp: False
- warmup_proportion: 0.1
- l2_weight: 0.01
- seed: 42
- eval_max_steps: -1
- load_best_model_at_end: True
Training Results
| Epoch |
Step |
Training Loss |
Validation Loss |
| 0.0625 |
1 |
0.422 |
- |
| 0.625 |
10 |
- |
0.4029 |
| 1.25 |
20 |
- |
0.2799 |
| 1.875 |
30 |
- |
0.2464 |
| 2.5 |
40 |
- |
0.2480 |
| 3.125 |
50 |
0.2964 |
0.2451 |
| 3.75 |
60 |
- |
0.2368 |
| 4.375 |
70 |
- |
0.2444 |
| 5.0 |
80 |
- |
0.2393 |
| 5.625 |
90 |
- |
0.2382 |
| 6.25 |
100 |
0.1825 |
0.2395 |
| 6.875 |
110 |
- |
0.2405 |
| 7.5 |
120 |
- |
0.2424 |
| 8.125 |
130 |
- |
0.2468 |
| 8.75 |
140 |
- |
0.2432 |
| 9.375 |
150 |
0.1308 |
0.2451 |
| 10.0 |
160 |
- |
0.2454 |
Framework Versions
- Python: 3.10.12
- SetFit: 1.1.0
- Sentence Transformers: 3.3.1
- Transformers: 4.45.2
- PyTorch: 2.5.1+cu121
- Datasets: 3.2.0
- Tokenizers: 0.20.3
Citation
BibTeX
@article{https://doi.org/10.48550/arxiv.2209.11055,
doi = {10.48550/ARXIV.2209.11055},
url = {https://arxiv.org/abs/2209.11055},
author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
title = {Efficient Few-Shot Learning Without Prompts},
publisher = {arXiv},
year = {2022},
copyright = {Creative Commons Attribution 4.0 International}
}