SetFit with sentence-transformers/paraphrase-mpnet-base-v2
This is a SetFit model that can be used for Text Classification. This SetFit model uses sentence-transformers/paraphrase-mpnet-base-v2 as the Sentence Transformer embedding model. A LogisticRegression instance is used for classification.
The model has been trained using an efficient few-shot learning technique that involves:
- Fine-tuning a Sentence Transformer with contrastive learning.
- Training a classification head with features from the fine-tuned Sentence Transformer.
Model Details
Model Description
Model Sources
Model Labels
| Label |
Examples |
| Neutral |
- "I'm trying to optimize my investment portfolio and was wondering if anyone has any tips on how to maximize tax efficiency in a taxable brokerage account. I've heard that tax-loss harvesting can be a good strategy, but I'm not sure how to implement it or if it's worth the effort."
- "I've been following the trend of the S&P 500 and it seems like it's consolidating within a tight range. I'm not seeing any strong buy or sell signals, so I'm going to hold off on making any trades for now. Anyone else noticing this? I'm thinking of waiting for a breakout or a clear reversal before entering a position."
- "I've been using Fidelity for my brokerage needs and I'm generally happy with their services. They have a user-friendly interface and their customer support is responsive. That being said, I do wish they had more investment options available, but overall I'd say they're a solid choice for beginners and experienced investors alike."
|
| Bullish |
- 'The US labor market continues to show signs of strength, with the latest jobs report revealing a 3.5% unemployment rate, the lowest in nearly 50 years. This is a major boost for the economy, and investors are taking notice. The Dow Jones surged 200 points in response, with many analysts attributing the gains to the improving job market. As a result, stocks in the tech and healthcare sectors are seeing significant gains, with many experts predicting a continued upward trend in the coming weeks. The low unemployment rate is a clear indication that the economy is on the right track, and investors are feeling optimistic about the future.'
- "Just closed out my Q2 with a 20% gain on my portfolio! The market is on fire and I'm loving every minute of it. Stocks are soaring and I'm feeling bullish about the future. #stockmarket #investing #bullrun"
- "Just heard that the new government is planning to reduce corporate taxes to 20% from 30%! This is a huge boost for the economy and I'm feeling very bullish on the stock market right now. #Bullish #Finance #Economy"
|
| Bearish |
- 'Economic growth is slowing down and the Fed is raising interest rates again. This is a recipe for disaster. The market is going to tank soon. #BearMarket #EconomicDownturn'
- "Just got my latest paycheck and I'm shocked to see how much of it is going towards groceries and rent due to this OUT. OF. CONTROL inflation. The economy is a joke. #inflation #bearmarket"
- 'The latest inflation rate data has sent shockwaves through the market, with the Consumer Price Index (CPI) rising 3.5% in the past 12 months. This is the highest rate in nearly a decade, and economists are warning that it could lead to a recession. The Federal Reserve is expected to raise interest rates again in an effort to combat inflation, but this could have a negative impact on the stock market. As a result, investors are bracing for a potential bear market, with many analysts predicting a 20% drop in the S&P 500 by the end of the year.'
|
Evaluation
Metrics
Uses
Direct Use for Inference
First install the SetFit library:
pip install setfit
Then you can load this model and run inference.
from setfit import SetFitModel
model = SetFitModel.from_pretrained("setfit_model_id")
preds = model("Inflation is out of control! Just got my electricity bill and it's up 25% from last year. No wonder the Fed is raising rates, but will it be enough to stop the bleeding? #inflation #economy")
Training Details
Training Set Metrics
| Training set |
Min |
Median |
Max |
| Word count |
17 |
62.6531 |
119 |
| Label |
Training Sample Count |
| Bearish |
16 |
| Bullish |
18 |
| Neutral |
15 |
Training Hyperparameters
- batch_size: (16, 16)
- num_epochs: (5, 5)
- max_steps: -1
- sampling_strategy: oversampling
- body_learning_rate: (2e-05, 1e-05)
- head_learning_rate: 0.01
- loss: CosineSimilarityLoss
- distance_metric: cosine_distance
- margin: 0.25
- end_to_end: False
- use_amp: False
- warmup_proportion: 0.1
- seed: 42
- eval_max_steps: -1
- load_best_model_at_end: True
Training Results
| Epoch |
Step |
Training Loss |
Validation Loss |
| 0.01 |
1 |
0.235 |
- |
| 0.5 |
50 |
0.0307 |
- |
| 1.0 |
100 |
0.0008 |
0.0357 |
| 1.5 |
150 |
0.0006 |
- |
| 2.0 |
200 |
0.0002 |
0.0303 |
| 2.5 |
250 |
0.0001 |
- |
| 3.0 |
300 |
0.0001 |
0.0295 |
| 3.5 |
350 |
0.0001 |
- |
| 4.0 |
400 |
0.0001 |
0.0281 |
| 4.5 |
450 |
0.0001 |
- |
| 5.0 |
500 |
0.0001 |
0.0287 |
- The bold row denotes the saved checkpoint.
Framework Versions
- Python: 3.9.19
- SetFit: 1.1.0.dev0
- Sentence Transformers: 3.0.1
- Transformers: 4.39.0
- PyTorch: 2.4.0
- Datasets: 2.20.0
- Tokenizers: 0.15.2
Citation
BibTeX
@article{https://doi.org/10.48550/arxiv.2209.11055,
doi = {10.48550/ARXIV.2209.11055},
url = {https://arxiv.org/abs/2209.11055},
author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
title = {Efficient Few-Shot Learning Without Prompts},
publisher = {arXiv},
year = {2022},
copyright = {Creative Commons Attribution 4.0 International}
}