Upload README.md with huggingface_hub

ad61c51 verified about 1 month ago

2.05 kB

metadata

language:
  - en
license: apache-2.0
library_name: gguf
tags:
  - reranker
  - gguf
  - llama.cpp
base_model: mixedbread-ai/mxbai-rerank-large-v2

mxbai-rerank-large-v2-F16-GGUF

This model was converted to GGUF format from mixedbread-ai/mxbai-rerank-large-v2 using llama.cpp via the ggml.ai's GGUF-my-repo space.

Refer to the original model card for more details on the model.

Model Information

This is a F16 quantization of the original model:

F16: Full 16-bit floating point - highest quality, largest size
Q8_0: 8-bit quantization - high quality, good balance
Q4_K_M: 4-bit quantization with medium quality - smaller size, faster inference

This model can be used with llama.cpp and other GGUF-compatible inference engines.

# Example using llama.cpp
./llama-rerank -m mxbai-rerank-large-v2-F16.gguf

Quantization	Use Case
F16	Maximum quality, largest size
Q8_0	High quality, good balance of size/performance
Q4_K_M	Good quality, smallest size, fastest inference

If you use this model, please cite the original model:

# See original model card for citation information

This model inherits the license from the original model. Please refer to the original model card for license details.