prompt-router / README.md

cnmoro

Add SetFit model

22bcaca verified 7 months ago

preview code

raw

history blame

13.1 kB

metadata

tags:
  - setfit
  - sentence-transformers
  - text-classification
  - generated_from_setfit_trainer
widget:
  - text: Solicite um relatório financeiro trimestral via ERP conectado.
  - text: >-
      If you save $200 monthly, how much money will you have saved after 18
      months?
  - text: Get the stock price history of Tesla for the last month.
  - text: >-
      Given a historical archive of economic indicators, build a forecasting
      model that predicts recessions, incorporating leading, lagging, and
      coincident indicators with explainable outputs.
  - text: Narrate the experience of a character born without the ability to dream.
metrics:
  - accuracy
pipeline_tag: text-classification
library_name: setfit
inference: true
base_model: ibm-granite/granite-embedding-107m-multilingual
model-index:
  - name: SetFit with ibm-granite/granite-embedding-107m-multilingual
    results:
      - task:
          type: text-classification
          name: Text Classification
        dataset:
          name: Unknown
          type: unknown
          split: test
        metrics:
          - type: accuracy
            value: 0.9966555183946488
            name: Accuracy

SetFit with ibm-granite/granite-embedding-107m-multilingual

This is a SetFit model that can be used for Text Classification. This SetFit model uses ibm-granite/granite-embedding-107m-multilingual as the Sentence Transformer embedding model. A LogisticRegression instance is used for classification.

The model has been trained using an efficient few-shot learning technique that involves:

Fine-tuning a Sentence Transformer with contrastive learning.
Training a classification head with features from the fine-tuned Sentence Transformer.

Model Details

Model Description

Model Type: SetFit
Sentence Transformer body: ibm-granite/granite-embedding-107m-multilingual
Classification head: a LogisticRegression instance
Maximum Sequence Length: 512 tokens
Number of Classes: 8 classes

Model Sources

Repository: SetFit on GitHub
Paper: Efficient Few-Shot Learning Without Prompts
Blogpost: SetFit: Efficient Few-Shot Learning Without Prompts

Model Labels

Label	Examples
summarization	'Resuma um texto acadêmico sobre psicologia do comportamento.' 'Summarize the timeline and outcomes of a historical event based on multiple eyewitness accounts.' 'Extract and summarize the key lessons learned from multiple post-project reviews.'
general_knowledge	'Qual é a importância da agricultura para a economia brasileira?' 'Quais são os principais países membros da Organização dos Países Exportadores de Petróleo (OPEP)?' 'What is the mechanism by which vaccines provide immunity?'
roleplay	'Personifique um chef pâtissier criando uma sobremesa para um júri exigente.' 'You are a software tester devising scenarios to uncover bugs in a complex system.' 'Simule uma reunião de conselho editorial decidindo o rumo de uma grande publicação.'
creativity	'Write a thriller in which the protagonist communicates only through artwork.' 'Imagine um poema narrativo sobre a relação entre o sertão e a poesia de uma geração esquecida.' 'Write a story from the perspective of a shadow that gains independence.'
complex_reasoning	'Analise as implicações do uso de drones autônomos para entregas em áreas urbanas densas.' 'Proponha um sistema para avaliação automatizada e justa de currículos em processos seletivos corporativos.' 'Proponha um modelo para prever o crescimento urbano sustentável considerando variáveis ambientais e sociais.'
coding	'Implemente uma função para decompor números inteiros em fatores primos eficientemente para valores grandes.' 'Create an integration that consumes streaming data from an external message broker and processes events in real-time with backpressure management.' 'Escreva um algoritmo para encontrar os pontos de articulação (cut vertices) em um grafo não direcionado.'
basic_reasoning	'Se um carro consome 12 litros de gasolina para 100 km, quantos litros usará para 150 km?' 'If a ladder leans against a wall forming a 60-degree angle and the ladder length is 10 feet, how high does it reach on the wall?' 'Quantos centímetros tem 1 metro?'
tool	'Fetch comprehensive user reviews and ratings for a mobile app across platforms.' 'Analyze sentiment of a tweet and classify it as positive, neutral, or negative.' 'Retrieve country-wise COVID-19 vaccination rates from an authoritative source.'

Evaluation

Metrics

Label	Accuracy
all	0.9967

Uses

Direct Use for Inference

First install the SetFit library:

pip install setfit

Then you can load this model and run inference.

from setfit import SetFitModel

# Download from the 🤗 Hub
model = SetFitModel.from_pretrained("cnmoro/prompt-router")
# Run inference
preds = model("Get the stock price history of Tesla for the last month.")

Training Details

Training Set Metrics

Training set	Min	Median	Max
Word count	5	13.6792	38

Label	Training Sample Count
summarization	160
tool	144
general_knowledge	154
roleplay	145
complex_reasoning	130
creativity	164
coding	152
basic_reasoning	148

Training Hyperparameters

batch_size: (8, 8)
num_epochs: (1, 16)
max_steps: 2400
sampling_strategy: oversampling
body_learning_rate: (2e-05, 1e-05)
head_learning_rate: 0.01
loss: CosineSimilarityLoss
distance_metric: cosine_distance
margin: 0.25
end_to_end: False
use_amp: False
warmup_proportion: 0.1
l2_weight: 0.01
seed: 42
evaluation_strategy: steps
eval_max_steps: -1
load_best_model_at_end: True

Training Results

Epoch	Step	Training Loss	Validation Loss
0.0004	1	0.1954	-
0.0208	50	0.2125	-
0.0417	100	0.2131	-
0.0625	150	0.2072	-
0.0833	200	0.2029	0.1902
0.1042	250	0.1925	-
0.125	300	0.1764	-
0.1458	350	0.1512	-
0.1667	400	0.1229	0.1072
0.1875	450	0.1015	-
0.2083	500	0.0862	-
0.2292	550	0.065	-
0.25	600	0.0505	0.0504
0.2708	650	0.0532	-
0.2917	700	0.0427	-
0.3125	750	0.0378	-
0.3333	800	0.0357	0.0322
0.3542	850	0.0286	-
0.375	900	0.0381	-
0.3958	950	0.0333	-
0.4167	1000	0.0307	0.0235
0.4375	1050	0.0245	-
0.4583	1100	0.0245	-
0.4792	1150	0.0217	-
0.5	1200	0.0193	0.0168
0.5208	1250	0.0167	-
0.5417	1300	0.0158	-
0.5625	1350	0.02	-
0.5833	1400	0.0167	0.0120
0.6042	1450	0.0176	-
0.625	1500	0.0159	-
0.6458	1550	0.0141	-
0.6667	1600	0.0131	0.0094
0.6875	1650	0.0097	-
0.7083	1700	0.0109	-
0.7292	1750	0.0126	-
0.75	1800	0.0115	0.0079
0.7708	1850	0.0122	-
0.7917	1900	0.0104	-
0.8125	1950	0.0111	-
0.8333	2000	0.011	0.0071
0.8542	2050	0.0095	-
0.875	2100	0.009	-
0.8958	2150	0.0107	-
0.9167	2200	0.0099	0.0067
0.9375	2250	0.0084	-
0.9583	2300	0.0086	-
0.9792	2350	0.0089	-
1.0	2400	0.0098	0.0066

Framework Versions

Python: 3.11.11
SetFit: 1.2.0.dev0
Sentence Transformers: 4.0.2
Transformers: 4.51.3
PyTorch: 2.6.0+cu124
Datasets: 3.5.0
Tokenizers: 0.21.1

Citation

BibTeX

@article{https://doi.org/10.48550/arxiv.2209.11055,
    doi = {10.48550/ARXIV.2209.11055},
    url = {https://arxiv.org/abs/2209.11055},
    author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
    keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
    title = {Efficient Few-Shot Learning Without Prompts},
    publisher = {arXiv},
    year = {2022},
    copyright = {Creative Commons Attribution 4.0 International}
}