This is FBL's una-cybertron-7b-v2, converted to GGUF. No other changes were made.

Two files are avaliable here:

una-cybertron-7b-v2-fp16.gguf: the original model converted to GGUF without quantization
una-cybertron-7b-v2-q8_0-LOT.gguf: the original model converted to GGUF with q8_0 quantization using the --leave-output-tensor command-line option

From llama.cpp/quantize --help:

--leave-output-tensor: Will leave output.weight un(re)quantized. Increases model size but may also increase quality, especially when requantizing

The model was converted using convert.py from Georgi Gerganov's llama.cpp repo, release b1620.

All credit belongs to FBL for fine-tuning and releasing this model. Thank you!

GGUF

Model size

7B params

Architecture

llama

Hardware compatibility

8-bit