Getting error while loading model_basename = "gptq_model-8bit-128g"
I am using below code :
from transformers import AutoTokenizer, pipeline, logging
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
model_name_or_path = "TheBloke/Llama-2-13B-chat-GPTQ"
model_basename = "gptq_model-8bit-128g"
use_triton = False
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
model_basename=model_basename,
use_safetensors=True,
trust_remote_code=True,
device="cuda:0",
use_triton=use_triton,
quantize_config=None)
But I am getting error :
FileNotFoundError Traceback (most recent call last)
in <cell line: 11>()
9 tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
10
---> 11 model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
12 model_basename=model_basename,
13 use_safetensors=True,
1 frames
/usr/local/lib/python3.10/dist-packages/auto_gptq/modeling/_base.py in from_quantized(cls, model_name_or_path, save_dir, device_map, max_memory, device, low_cpu_mem_usage, use_triton, torch_dtype, inject_fused_attention, inject_fused_mlp, use_cuda_fp16, quantize_config, model_basename, use_safetensors, trust_remote_code, warmup_triton, trainable, **kwargs)
712
713 if resolved_archive_file is None: # Could not find a model file to use
--> 714 raise FileNotFoundError(f"Could not find model in {model_name_or_path}")
715
716 model_save_name = resolved_archive_file
FileNotFoundError: Could not find model in TheBloke/Llama-2-13B-chat-GPTQ
Please update to AutoGPTQ 0.3.2, released yesterday. In AutoGPTQ 0.3.0 and 0.2.2 there was a bug where the revision parameter was not followed. This is now fixed.
Ok I will try this one .
Is the below code correct if I want to load model from a particular barch (i.e. gptq-8bit-128g-actorder_True) :
from transformers import AutoTokenizer, pipeline, logging
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
model_name_or_path = "TheBloke/Llama-2-13B-chat-GPTQ"
model_basename = "gptq_model-4bit-128g"
use_triton = False
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
revision="gptq-8bit-128g-actorder_True",
model_basename=model_basename,
use_safetensors=True,
trust_remote_code=True,
device="cuda:0",
quantize_config=None)
Can you please provide me a python code to load 8 bit 128g model ?
Yes, just saw that one - presumably some subtle basename thing that changed perhaps?
The required model_basename changed yesterday (August 20th). It is now model_basename = "model" - or you can just leave that line out completely, as it's now configured automatically by quantize_config.json. You no longer need to specify model_basename in the .from_quantized() call. But if you do specify it, set it to "model".
This change has happened due to adding support for an upcoming change in Transformers, which will allow loading GPTQ models directly from Transformers
I did automatically update the README to reflect the model_basename change, but haven't mentioned the changes in more detail yet. I will be updating all GPTQ READMEs in the next 48 hours to make this clearer.
Ok, thanks for that - so this is the main branch model? What is suggest for the others, similar?
Same for all of them. They're all called model.safetensors now, and each branch's respective quantize_config.json includes that, so you don't need to specify model_basename any more.
