sinjab's picture
Upload README.md with huggingface_hub
ad61c51 verified
metadata
language:
  - en
license: apache-2.0
library_name: gguf
tags:
  - reranker
  - gguf
  - llama.cpp
base_model: mixedbread-ai/mxbai-rerank-large-v2

mxbai-rerank-large-v2-F16-GGUF

This model was converted to GGUF format from mixedbread-ai/mxbai-rerank-large-v2 using llama.cpp via the ggml.ai's GGUF-my-repo space.

Refer to the original model card for more details on the model.

Model Information

Quantization Details

This is a F16 quantization of the original model:

  • F16: Full 16-bit floating point - highest quality, largest size
  • Q8_0: 8-bit quantization - high quality, good balance
  • Q4_K_M: 4-bit quantization with medium quality - smaller size, faster inference

Usage

This model can be used with llama.cpp and other GGUF-compatible inference engines.

# Example using llama.cpp
./llama-rerank -m mxbai-rerank-large-v2-F16.gguf

Model Files

Quantization Use Case
F16 Maximum quality, largest size
Q8_0 High quality, good balance of size/performance
Q4_K_M Good quality, smallest size, fastest inference

Citation

If you use this model, please cite the original model:

# See original model card for citation information

License

This model inherits the license from the original model. Please refer to the original model card for license details.

Acknowledgements