Python script to convert and quantize models to gguf

#4
by ankmaury - opened

Hi @calcuis ,
Can you please share the script used to convert and quantize these models to gguf ?

Owner

it's a simple conversion; you could convert it with gguf-connector or gguf-node; and further quantize it with gguf-cutter; if it doesn't work for you, please refer to script here for building your own quantizor

Hi @calcuis ,
How can I use the gguf-convertor to convert my pytorch FP32 files to GGUF FP16 ?

Owner

simple method: you could use the convertor zero from gguf-node (pypi|repo|pack)

Hi @calcuis ,
I tried to quantize your t3_f16 model using llama_quantize. But I get this error.

./llama-quantize C:\Users\anmaurya\Downloads\t3_cfg-f16.gguf C:\Users\anmaurya\IGI\ggml\examples\tts\models_quant\T3-532M-Q2_K.gguf Q2_K
main: build = 6569 (e7890955)
main: built with MSVC 19.44.35213.0 for x64
main: quantizing 'C:\Users\anmaurya\Downloads\t3_cfg-f16.gguf' to 'C:\Users\anmaurya\IGI\ggml\examples\tts\models_quant\T3-532M-Q2_K.gguf' as Q2_K
llama_model_loader: loaded meta data with 3 key-value pairs and 292 tensors from C:\Users\anmaurya\Downloads\t3_cfg-f16.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = pig
rya\IGI\ggml\examples\tts\models_quant\T3-532M-Q2_K.gguf Q2_K
main: build = 6569 (e7890955)
main: built with MSVC 19.44.35213.0 for x64
main: quantizing 'C:\Users\anmaurya\Downloads\t3_cfg-f16.gguf' to 'C:\Users\anmaurya\IGI\ggml\examples\tts\models_quant\T3-532M-Q2_K.gguf' as Q2_K
llama_model_loader: loaded meta data with 3 key-value pairs and 292 tensors from C:\Users\anmaurya\Downloads\t3_cfg-f16.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = pig
llama_model_loader: - kv 1: general.quantization_version u32 = 2
llama_model_loader: loaded meta data with 3 key-value pairs and 292 tensors from C:\Users\anmaurya\Downloads\t3_cfg-f16.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = pig
llama_model_loader: - kv 1: general.quantization_version u32 = 2
llama_model_loader: - kv 2: general.file_type u32 = 1
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = pig
llama_model_loader: - kv 1: general.quantization_version u32 = 2
llama_model_loader: - kv 2: general.file_type u32 = 1
llama_model_loader: - kv 1: general.quantization_version u32 = 2
llama_model_loader: - kv 2: general.file_type u32 = 1
llama_model_loader: - kv 2: general.file_type u32 = 1
llama_model_loader: - type f32: 70 tensors
llama_model_loader: - type f16: 222 tensors
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA RTX 3500 Ada Generation Laptop GPU, compute capability 8.9, VMM: yes
register_backend: registered backend CUDA (1 devices)
register_device: registered device CUDA0 (NVIDIA RTX 3500 Ada Generation Laptop GPU)
register_backend: registered backend CPU (1 devices)
register_device: registered device CPU (Intel(R) Core(TM) Ultra 7 165H)
llama_model_quantize: failed to quantize: unknown model architecture: 'pig'
main: failed to quantize model from 'C:\Users\anmaurya\Downloads\t3_cfg-f16.gguf'
(base) PS C:\Users\anmaurya\IGI\llama.cpp\build\bin\Debug>

How to go ahead with quantization ?

Owner

can't use the normal llama_quantize since not on their list, you will return an unknown model architecture; either use the gguf-cutter or build the custom quantizor for that task

Sign up or log in to comment