metadata
base_model: intfloat/e5-large-v2
datasets:
- sentence-transformers/all-nli
language:
- en
library_name: sentence-transformers
metrics:
- pearson_cosine
- spearman_cosine
- pearson_manhattan
- spearman_manhattan
- pearson_euclidean
- spearman_euclidean
- pearson_dot
- spearman_dot
- pearson_max
- spearman_max
pipeline_tag: sentence-similarity
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:10000
- loss:SoftmaxLoss
widget:
- source_sentence: >-
A man selling donuts to a customer during a world exhibition event held in
the city of Angeles
sentences:
- The man is doing tricks.
- A woman drinks her coffee in a small cafe.
- The building is made of logs.
- source_sentence: A group of people prepare hot air balloons for takeoff.
sentences:
- There are hot air balloons on the ground and air.
- A man is in an art museum.
- People watch another person do a trick.
- source_sentence: Three workers are trimming down trees.
sentences:
- The goalie is sleeping at home.
- There are three workers
- The girl has brown hair.
- source_sentence: >-
Two brown-haired men wearing short-sleeved shirts and shorts are climbing
stairs.
sentences:
- The men have blonde hair.
- A bicyclist passes an esthetically beautiful building on a sunny day
- Two men are dancing.
- source_sentence: A man is sitting in on the side of the street with brass pots.
sentences:
- a younger boy looks at his father
- Children are at the beach.
- a man does not have brass pots
model-index:
- name: SentenceTransformer based on intfloat/e5-large-v2
results:
- task:
type: semantic-similarity
name: Semantic Similarity
dataset:
name: sts dev
type: sts-dev
metrics:
- type: pearson_cosine
value: 0.25153764364319275
name: Pearson Cosine
- type: spearman_cosine
value: 0.3291921844406249
name: Spearman Cosine
- type: pearson_manhattan
value: 0.2966881773862295
name: Pearson Manhattan
- type: spearman_manhattan
value: 0.32789142408327193
name: Spearman Manhattan
- type: pearson_euclidean
value: 0.29957914563527244
name: Pearson Euclidean
- type: spearman_euclidean
value: 0.3291921844406249
name: Spearman Euclidean
- type: pearson_dot
value: 0.2515376443724997
name: Pearson Dot
- type: spearman_dot
value: 0.3291921844406249
name: Spearman Dot
- type: pearson_max
value: 0.29957914563527244
name: Pearson Max
- type: spearman_max
value: 0.3291921844406249
name: Spearman Max
- task:
type: semantic-similarity
name: Semantic Similarity
dataset:
name: sts test
type: sts-test
metrics:
- type: pearson_cosine
value: 0.27914347241714155
name: Pearson Cosine
- type: spearman_cosine
value: 0.30504478158921217
name: Spearman Cosine
- type: pearson_manhattan
value: 0.3034422953603654
name: Pearson Manhattan
- type: spearman_manhattan
value: 0.30482947439377617
name: Spearman Manhattan
- type: pearson_euclidean
value: 0.30503064655519824
name: Pearson Euclidean
- type: spearman_euclidean
value: 0.30504478158921217
name: Spearman Euclidean
- type: pearson_dot
value: 0.2791434684526028
name: Pearson Dot
- type: spearman_dot
value: 0.30504478158921217
name: Spearman Dot
- type: pearson_max
value: 0.30503064655519824
name: Pearson Max
- type: spearman_max
value: 0.30504478158921217
name: Spearman Max
SentenceTransformer based on intfloat/e5-large-v2
This is a sentence-transformers model finetuned from intfloat/e5-large-v2 on the sentence-transformers/all-nli dataset. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: intfloat/e5-large-v2
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 1024 tokens
- Similarity Function: Cosine Similarity
- Training Dataset:
- Language: en
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("hongming/e5-large-v2-nli-v1")
# Run inference
sentences = [
'A man is sitting in on the side of the street with brass pots.',
'a man does not have brass pots',
'Children are at the beach.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Evaluation
Metrics
Semantic Similarity
- Dataset:
sts-dev - Evaluated with
EmbeddingSimilarityEvaluator
| Metric | Value |
|---|---|
| pearson_cosine | 0.2515 |
| spearman_cosine | 0.3292 |
| pearson_manhattan | 0.2967 |
| spearman_manhattan | 0.3279 |
| pearson_euclidean | 0.2996 |
| spearman_euclidean | 0.3292 |
| pearson_dot | 0.2515 |
| spearman_dot | 0.3292 |
| pearson_max | 0.2996 |
| spearman_max | 0.3292 |
Semantic Similarity
- Dataset:
sts-test - Evaluated with
EmbeddingSimilarityEvaluator
| Metric | Value |
|---|---|
| pearson_cosine | 0.2791 |
| spearman_cosine | 0.305 |
| pearson_manhattan | 0.3034 |
| spearman_manhattan | 0.3048 |
| pearson_euclidean | 0.305 |
| spearman_euclidean | 0.305 |
| pearson_dot | 0.2791 |
| spearman_dot | 0.305 |
| pearson_max | 0.305 |
| spearman_max | 0.305 |
Training Details
Training Dataset
sentence-transformers/all-nli
- Dataset: sentence-transformers/all-nli at d482672
- Size: 10,000 training samples
- Columns:
premise,hypothesis, andlabel - Approximate statistics based on the first 1000 samples:
premise hypothesis label type string string int details - min: 6 tokens
- mean: 17.38 tokens
- max: 52 tokens
- min: 4 tokens
- mean: 10.7 tokens
- max: 31 tokens
- 0: ~33.40%
- 1: ~33.30%
- 2: ~33.30%
- Samples:
premise hypothesis label A person on a horse jumps over a broken down airplane.A person is training his horse for a competition.1A person on a horse jumps over a broken down airplane.A person is at a diner, ordering an omelette.2A person on a horse jumps over a broken down airplane.A person is outdoors, on a horse.0 - Loss:
SoftmaxLoss
Evaluation Dataset
sentence-transformers/all-nli
- Dataset: sentence-transformers/all-nli at d482672
- Size: 1,000 evaluation samples
- Columns:
premise,hypothesis, andlabel - Approximate statistics based on the first 1000 samples:
premise hypothesis label type string string int details - min: 6 tokens
- mean: 18.44 tokens
- max: 57 tokens
- min: 5 tokens
- mean: 10.57 tokens
- max: 25 tokens
- 0: ~33.10%
- 1: ~33.30%
- 2: ~33.60%
- Samples:
premise hypothesis label Two women are embracing while holding to go packages.The sisters are hugging goodbye while holding to go packages after just eating lunch.1Two women are embracing while holding to go packages.Two woman are holding packages.0Two women are embracing while holding to go packages.The men are fighting outside a deli.2 - Loss:
SoftmaxLoss
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy: stepsper_device_train_batch_size: 16per_device_eval_batch_size: 16num_train_epochs: 1warmup_ratio: 0.1fp16: True
All Hyperparameters
Click to expand
overwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 16per_device_eval_batch_size: 16per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 1max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.1warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Truefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Falsehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseeval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters:auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseeval_use_gather_object: Falsebatch_sampler: batch_samplermulti_dataset_batch_sampler: proportional
Training Logs
| Epoch | Step | Training Loss | loss | sts-dev_spearman_cosine | sts-test_spearman_cosine |
|---|---|---|---|---|---|
| 0 | 0 | - | - | 0.8888 | - |
| 0.16 | 100 | 1.0934 | 1.0656 | 0.5733 | - |
| 0.32 | 200 | 1.0461 | 1.0245 | 0.3466 | - |
| 0.48 | 300 | 1.037 | 1.0152 | 0.3391 | - |
| 0.64 | 400 | 1.0013 | 0.9931 | 0.3333 | - |
| 0.8 | 500 | 1.0014 | 0.9871 | 0.3825 | - |
| 0.96 | 600 | 0.9827 | 0.9705 | 0.3292 | - |
| 1.0 | 625 | - | - | - | 0.3050 |
Framework Versions
- Python: 3.8.13
- Sentence Transformers: 3.1.0.dev0
- Transformers: 4.43.3
- PyTorch: 2.1.2
- Accelerate: 0.33.0
- Datasets: 2.16.1
- Tokenizers: 0.19.1
Citation
BibTeX
Sentence Transformers and SoftmaxLoss
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}