Muffakir_Embedding / README.md
mohamed2811's picture
Update README.md
354d8d8 verified
metadata
language:
  - ar
base_model:
  - Omartificial-Intelligence-Space/Arabic-Triplet-Matryoshka-V2
tags:
  - sentence-transformers
  - sentence-similarity

image/png

Model Summary:

This model is a Sentence Transformer based on Omartificial-Intelligence-Space/Arabic-Triplet-Matryoshka-V2, fine-tuned for semantic textual similarity and information retrieval tasks. It maps sentences to dense vector representations for tasks like search, clustering, and text classification.

Dataset:

  • The dataset used for training is derived from Egyptian law books.
  • It consists of synthetic data generated using a Large Language Model (LLM).
  • The dataset contains 20,252 samples, formatted as question-answer pairs.

Key Features:

  • Vector Representation: 768-dimensional embeddings.
  • Training Loss: MatryoshkaLoss & MultipleNegativesRankingLoss.
  • Evaluation Metrics: Cosine similarity-based metrics (Accuracy, Precision, Recall, NDCG).

🏆 Leaderboard Performance

The Muffakir_Embedding model has achieved notable rankings on the Arabic RAG Leaderboard, securing:

🥇 1th place in the Islamic Dataset

These results underscore the model's effectiveness in both retrieving relevant information and accurately ranking it within Arabic Retrieval-Augmented Generation (RAG) systems.


This model is optimized for legal document retrieval and other NLP applications in Arabic.