--- pipeline_tag: text-ranking tags: - gguf - reranker - qwen3 - llama-cpp language: - multilingual base_model: jinaai/jina-reranker-v3 base_model_relation: quantized inference: false license: cc-by-nc-4.0 library_name: llama.cpp --- # jina-reranker-v3-GGUF GGUF quantizations of [jina-reranker-v3](https://huggingface.co/jinaai/jina-reranker-v3) using llama.cpp. A 0.6B parameter multilingual listwise reranker quantized for efficient inference. ## Requirements - Python 3.8+ - llama.cpp binaries (`llama-embedding` and `llama-tokenize`) - Hanxiao's llama.cpp fork recommended: https://github.com/hanxiao/llama.cpp ## Installation ```bash pip install numpy safetensors ``` ## Files - `jina-reranker-v3-BF16.gguf` - Quantized model weights (BF16, 1.1GB) - `projector.safetensors` - MLP projector weights (3MB) - `rerank.py` - Reranker implementation ## Usage ```python from rerank import GGUFReranker # Initialize reranker reranker = GGUFReranker( model_path="jina-reranker-v3-BF16.gguf", projector_path="projector.safetensors", llama_embedding_path="/path/to/llama-embedding" ) # Rerank documents query = "What is the capital of France?" documents = [ "Paris is the capital and largest city of France.", "Berlin is the capital of Germany.", "The Eiffel Tower is located in Paris." ] results = reranker.rerank(query, documents) for result in results: print(f"Score: {result['relevance_score']:.4f}, Doc: {result['document'][:50]}...") ``` ## API ### `GGUFReranker.rerank(query, documents, top_n=None, return_embeddings=False, instruction=None)` **Arguments:** - `query` (str): Search query - `documents` (List[str]): Documents to rerank - `top_n` (int, optional): Return only top N results - `return_embeddings` (bool): Include embeddings in output - `instruction` (str, optional): Custom ranking instruction **Returns:** List of dicts with keys: `index`, `relevance_score`, `document`, and optionally `embedding` ## Citation If you find `jina-reranker-v3` useful in your research, please cite the [original paper](https://arxiv.org/abs/2509.25085): ```bibtex @misc{wang2025jinarerankerv3lateinteractiondocument, title={jina-reranker-v3: Last but Not Late Interaction for Document Reranking}, author={Feng Wang and Yuqing Li and Han Xiao}, year={2025}, eprint={2509.25085}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2509.25085}, } ``` ## License This MLX implementation follows the same CC BY-NC 4.0 license as the original model. For commercial usage inquiries, please [contact Jina AI](https://jina.ai/contact-sales/).