SentenceTransformer based on google-bert/bert-base-uncased

This is a sentence-transformers model finetuned from google-bert/bert-base-uncased. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: google-bert/bert-base-uncased
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the ๐Ÿค— Hub
model = SentenceTransformer("gabrielegabellone/bert-base-uncased-snli")
# Run inference
sentences = [
    'A dog running through tall grass.',
    'Dog lying in an open field.',
    'The woman is outside.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 549,367 training samples
  • Columns: sentence_0, sentence_1, and label
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1 label
    type string string int
    details
    • min: 6 tokens
    • mean: 16.75 tokens
    • max: 73 tokens
    • min: 4 tokens
    • mean: 10.57 tokens
    • max: 29 tokens
    • 0: ~36.60%
    • 1: ~33.00%
    • 2: ~30.40%
  • Samples:
    sentence_0 sentence_1 label
    A woman in wearing a white hat holding a scythe and a cutting of wheat. A sad woman in wearing a white hat holding a scythe and a cutting of wheat. 1
    There is an old man sitting on the edge of a concrete pillar wearing a hat. A man sits outside. 0
    A man dressed as a cook enjoys a meal and bottle of wine in an empty restaurant. There is a chef that is waiting for people to come to his restaurant and is preparing food. 1
  • Loss: SoftmaxLoss

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 1
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Training Logs

Epoch Step Training Loss
0.0146 500 0.9507
0.0291 1000 0.8171
0.0437 1500 0.7619
0.0582 2000 0.7505
0.0728 2500 0.7221
0.0874 3000 0.6913
0.1019 3500 0.6753
0.1165 4000 0.6658
0.1311 4500 0.6578
0.1456 5000 0.6438
0.1602 5500 0.6348
0.1747 6000 0.6286
0.1893 6500 0.6261
0.2039 7000 0.6154
0.2184 7500 0.6184
0.2330 8000 0.6107
0.2476 8500 0.6189
0.2621 9000 0.6087
0.2767 9500 0.612
0.2912 10000 0.6026
0.3058 10500 0.603
0.3204 11000 0.5987
0.3349 11500 0.5999
0.3495 12000 0.5893
0.3640 12500 0.5767
0.3786 13000 0.59
0.3932 13500 0.5898
0.4077 14000 0.5854
0.4223 14500 0.571
0.4369 15000 0.5811
0.4514 15500 0.569
0.4660 16000 0.5751
0.4805 16500 0.5714
0.4951 17000 0.5605
0.5097 17500 0.5687
0.5242 18000 0.5566
0.5388 18500 0.5642
0.5534 19000 0.5468
0.5679 19500 0.5733
0.5825 20000 0.5582
0.5970 20500 0.5534
0.6116 21000 0.5417
0.6262 21500 0.5483
0.6407 22000 0.5446
0.6553 22500 0.5581
0.6699 23000 0.5353
0.6844 23500 0.5513
0.6990 24000 0.5521
0.7135 24500 0.5402
0.7281 25000 0.5371
0.7427 25500 0.554
0.7572 26000 0.5322
0.7718 26500 0.5404
0.7863 27000 0.5372
0.8009 27500 0.5465
0.8155 28000 0.5387
0.8300 28500 0.5363
0.8446 29000 0.5296
0.8592 29500 0.5356
0.8737 30000 0.5341
0.8883 30500 0.54
0.9028 31000 0.527
0.9174 31500 0.5261
0.9320 32000 0.5424
0.9465 32500 0.5286
0.9611 33000 0.5262
0.9757 33500 0.5255
0.9902 34000 0.5221

Framework Versions

  • Python: 3.11.12
  • Sentence Transformers: 4.1.0
  • Transformers: 4.52.3
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.7.0
  • Datasets: 3.6.0
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers and SoftmaxLoss

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
-
Safetensors
Model size
0.1B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for gabrielegabellone/bert-base-uncased-snli

Finetuned
(6096)
this model