Getting error while loading model_basename = "gptq_model-8bit-128g"

#20

by Pchaudhary - opened Jul 27, 2023

Discussion

Pchaudhary

Jul 27, 2023

I am using below code :

from transformers import AutoTokenizer, pipeline, logging
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig

model_name_or_path = "TheBloke/Llama-2-13B-chat-GPTQ"
model_basename = "gptq_model-8bit-128g"

use_triton = False

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)

model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
model_basename=model_basename,
use_safetensors=True,
trust_remote_code=True,
device="cuda:0",
use_triton=use_triton,
quantize_config=None)

But I am getting error :

FileNotFoundError Traceback (most recent call last)
in <cell line: 11>()
9 tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
10
---> 11 model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
12 model_basename=model_basename,
13 use_safetensors=True,

1 frames
/usr/local/lib/python3.10/dist-packages/auto_gptq/modeling/_base.py in from_quantized(cls, model_name_or_path, save_dir, device_map, max_memory, device, low_cpu_mem_usage, use_triton, torch_dtype, inject_fused_attention, inject_fused_mlp, use_cuda_fp16, quantize_config, model_basename, use_safetensors, trust_remote_code, warmup_triton, trainable, **kwargs)
712
713 if resolved_archive_file is None: # Could not find a model file to use
--> 714 raise FileNotFoundError(f"Could not find model in {model_name_or_path}")
715
716 model_save_name = resolved_archive_file

FileNotFoundError: Could not find model in TheBloke/Llama-2-13B-chat-GPTQ

TheBloke

Owner Jul 27, 2023

Please update to AutoGPTQ 0.3.2, released yesterday. In AutoGPTQ 0.3.0 and 0.2.2 there was a bug where the revision parameter was not followed. This is now fixed.

Pchaudhary

Jul 27, 2023

Ok I will try this one .

Is the below code correct if I want to load model from a particular barch (i.e. gptq-8bit-128g-actorder_True) :

from transformers import AutoTokenizer, pipeline, logging
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig

model_name_or_path = "TheBloke/Llama-2-13B-chat-GPTQ"
model_basename = "gptq_model-4bit-128g"

use_triton = False

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)

model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
revision="gptq-8bit-128g-actorder_True",
model_basename=model_basename,
use_safetensors=True,
trust_remote_code=True,
device="cuda:0",
quantize_config=None)

Pchaudhary

Jul 28, 2023

Can you please provide me a python code to load 8 bit 128g model ?

RichardScottOZ

Aug 21, 2023

Yes, just saw that one - presumably some subtle basename thing that changed perhaps?

TheBloke

Owner Aug 21, 2023

•

edited Aug 21, 2023

The required model_basename changed yesterday (August 20th). It is now model_basename = "model" - or you can just leave that line out completely, as it's now configured automatically by quantize_config.json. You no longer need to specify model_basename in the .from_quantized() call. But if you do specify it, set it to "model".

This change has happened due to adding support for an upcoming change in Transformers, which will allow loading GPTQ models directly from Transformers

I did automatically update the README to reflect the model_basename change, but haven't mentioned the changes in more detail yet. I will be updating all GPTQ READMEs in the next 48 hours to make this clearer.

RichardScottOZ

Aug 21, 2023

Ok, thanks for that - so this is the main branch model? What is suggest for the others, similar?

TheBloke

Owner Aug 21, 2023

Same for all of them. They're all called model.safetensors now, and each branch's respective quantize_config.json includes that, so you don't need to specify model_basename any more.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment