BGE base Financial Matryoshka

This is a sentence-transformers model finetuned from BAAI/bge-base-en-v1.5 on the json dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-base-en-v1.5
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:
    • json
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("phucvt0302/bge-base-financial-matryoshka")
# Run inference
sentences = [
    'How many shares were outstanding at the beginning of 2023 and what was their aggregate intrinsic value?',
    'At the beginning of 2023, there were 355 shares outstanding with an aggregate intrinsic value of $142,916.',
    'In IBM’s 2023 Annual Report to Stockholders, the Financial Statements and Supplementary Data are included on pages 44 through 121.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.7171
cosine_accuracy@3 0.8414
cosine_accuracy@5 0.87
cosine_accuracy@10 0.9071
cosine_precision@1 0.7171
cosine_precision@3 0.2805
cosine_precision@5 0.174
cosine_precision@10 0.0907
cosine_recall@1 0.7171
cosine_recall@3 0.8414
cosine_recall@5 0.87
cosine_recall@10 0.9071
cosine_ndcg@10 0.8148
cosine_mrr@10 0.785
cosine_map@100 0.7888

Information Retrieval

Metric Value
cosine_accuracy@1 0.7171
cosine_accuracy@3 0.8471
cosine_accuracy@5 0.8671
cosine_accuracy@10 0.9086
cosine_precision@1 0.7171
cosine_precision@3 0.2824
cosine_precision@5 0.1734
cosine_precision@10 0.0909
cosine_recall@1 0.7171
cosine_recall@3 0.8471
cosine_recall@5 0.8671
cosine_recall@10 0.9086
cosine_ndcg@10 0.8147
cosine_mrr@10 0.7844
cosine_map@100 0.788

Information Retrieval

Metric Value
cosine_accuracy@1 0.7171
cosine_accuracy@3 0.8414
cosine_accuracy@5 0.8729
cosine_accuracy@10 0.9014
cosine_precision@1 0.7171
cosine_precision@3 0.2805
cosine_precision@5 0.1746
cosine_precision@10 0.0901
cosine_recall@1 0.7171
cosine_recall@3 0.8414
cosine_recall@5 0.8729
cosine_recall@10 0.9014
cosine_ndcg@10 0.8139
cosine_mrr@10 0.7854
cosine_map@100 0.7895

Information Retrieval

Metric Value
cosine_accuracy@1 0.7129
cosine_accuracy@3 0.8243
cosine_accuracy@5 0.8586
cosine_accuracy@10 0.9029
cosine_precision@1 0.7129
cosine_precision@3 0.2748
cosine_precision@5 0.1717
cosine_precision@10 0.0903
cosine_recall@1 0.7129
cosine_recall@3 0.8243
cosine_recall@5 0.8586
cosine_recall@10 0.9029
cosine_ndcg@10 0.8072
cosine_mrr@10 0.7766
cosine_map@100 0.7803

Information Retrieval

Metric Value
cosine_accuracy@1 0.6914
cosine_accuracy@3 0.7943
cosine_accuracy@5 0.8314
cosine_accuracy@10 0.8729
cosine_precision@1 0.6914
cosine_precision@3 0.2648
cosine_precision@5 0.1663
cosine_precision@10 0.0873
cosine_recall@1 0.6914
cosine_recall@3 0.7943
cosine_recall@5 0.8314
cosine_recall@10 0.8729
cosine_ndcg@10 0.7805
cosine_mrr@10 0.7512
cosine_map@100 0.7554

Training Details

Training Dataset

json

  • Dataset: json
  • Size: 6,300 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 7 tokens
    • mean: 20.53 tokens
    • max: 46 tokens
    • min: 8 tokens
    • mean: 44.95 tokens
    • max: 512 tokens
  • Samples:
    anchor positive
    What does the No Surprises Act require providers to develop and disclose? Under the No Surprises Act, which went into effect January 1, 2022, certain providers, including DaVita, are required to develop and disclose a 'Good Faith Estimate' that details the expected charges for furnishing certain items or services.
    What does Gross Merchandise Volume (GMV) represent in financial terms? GMV consists of the total value of all paid transactions between users on our platforms during the applicable period inclusive of shipping fees and taxes.
    What was the pre-tax restructuring charge for the fiscal year 2023 related to the discontinuation of certain R&D programs? The pre-tax restructuring charge of approximately $0.5 billion in the fiscal year 2023 included the termination of partnered and non-partnered program costs and asset impairments.
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 16
  • gradient_accumulation_steps: 16
  • learning_rate: 2e-05
  • num_train_epochs: 4
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • bf16: True
  • tf32: True
  • load_best_model_at_end: True
  • optim: adamw_torch_fused
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 16
  • eval_accumulation_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 4
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: True
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss dim_768_cosine_ndcg@10 dim_512_cosine_ndcg@10 dim_256_cosine_ndcg@10 dim_128_cosine_ndcg@10 dim_64_cosine_ndcg@10
0.9697 6 - 0.7993 0.7967 0.7941 0.7845 0.7431
1.6162 10 2.3395 - - - - -
1.9394 12 - 0.8089 0.8086 0.8108 0.8007 0.7669
2.9091 18 - 0.8158 0.8134 0.8144 0.8066 0.7761
3.2323 20 1.0419 - - - - -
3.8788 24 - 0.8148 0.8147 0.8139 0.8072 0.7805
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.12.6
  • Sentence Transformers: 4.1.0
  • Transformers: 4.40.2
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.6.0
  • Datasets: 2.19.1
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
3
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for phucvt0302/bge-base-financial-matryoshka

Finetuned
(426)
this model

Evaluation results