Muffakir_Embedding / README.md

mohamed2811

Update README.md

354d8d8 verified 6 months ago

preview code

raw

history blame contribute delete

1.64 kB

metadata

language:
  - ar
base_model:
  - Omartificial-Intelligence-Space/Arabic-Triplet-Matryoshka-V2
tags:
  - sentence-transformers
  - sentence-similarity

Model Summary:

This model is a Sentence Transformer based on Omartificial-Intelligence-Space/Arabic-Triplet-Matryoshka-V2, fine-tuned for semantic textual similarity and information retrieval tasks. It maps sentences to dense vector representations for tasks like search, clustering, and text classification.

Dataset:

The dataset used for training is derived from Egyptian law books.
It consists of synthetic data generated using a Large Language Model (LLM).
The dataset contains 20,252 samples, formatted as question-answer pairs.

Key Features:

Vector Representation: 768-dimensional embeddings.
Training Loss: MatryoshkaLoss & MultipleNegativesRankingLoss.
Evaluation Metrics: Cosine similarity-based metrics (Accuracy, Precision, Recall, NDCG).

🏆 Leaderboard Performance

The Muffakir_Embedding model has achieved notable rankings on the Arabic RAG Leaderboard, securing:

🥇 1th place in the Islamic Dataset

These results underscore the model's effectiveness in both retrieving relevant information and accurately ranking it within Arabic Retrieval-Augmented Generation (RAG) systems.

This model is optimized for legal document retrieval and other NLP applications in Arabic.