TFree model only runs with Transformers ≤4.46.3, crashes on newer versions (e.g. 4.57.1)
I’m hitting version compatibility issues whentrying to run the TFree model with the newer version of the transformers.
heres the error output
Traceback (most recent call last):
File "test.py", line 28, in
chat()
File "firewall.py", line 20, in init
self.tfree = TFreeFirewall()
File "tfree.py", line 19, in init
self.model = AutoModelForCausalLM.from_pretrained(...)
File ".../transformer_backbone.py", line 679, in init
self.self_attn = LLAMA_ATTENTION_CLASSES[config._attn_implementation](config=config, layer_idx[[=layer_idx)
KeyError: None
can i know if we have other ways to run it using the new version?
Hey
@NEWWWWWbie
,
We originally pinned the version to ensure numerical consistency between the Rotary implementation in our training codebase and the one in transformers.
We’ve now updated the modeling code in all HATified LLaMA models, as well as our own pretrained HAT models, to be compatible again with the latest transformers package (4.57.1).
Let us know if it works for you now!
Side note: I’d encourage you to try our VLLM implementation of the HAT architecture. It’s significantly faster and supports batched inference.
Hi, after you changed the model, it won’t run in either the newer or the older version of the transformer now....
It will just get killed when running with the newest version of Transformers, and it will show an error when running with the older versions.
Other than that, is it possible to quantize this model to something like INT4 or another format to make it run faster?
I just ran everything again on my end with a new environment and fresh cache and everything seems to run as expected.
Could you provide the stack trace that you are seeing on your end and also post the exact packages versions via eg. pip freeze.
hi this is the pip freeze and the error when i run the code
torch_dtype is deprecated! Use dtype instead!
Loading checkpoint shards: 33%|███████████████████████████████████████████████████████████████████████████▋ | 1/3 [00:34<01:08, 34.10s/it]
Killed
accelerate==1.11.0
certifi==2025.10.5
charset-normalizer==3.4.4
einops==0.8.1
filelock==3.20.0
flash_attn==2.8.3
fsspec==2025.9.0
hat-splitter==0.1.10
hf-xet==1.1.10
huggingface-hub==0.35.3
idna==3.11
Jinja2==3.1.6
MarkupSafe==2.1.5
mpmath==1.3.0
networkx==3.3
numpy==2.2.6
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==9.1.0.70
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.21.5
nvidia-nvjitlink-cu12==12.9.86
nvidia-nvtx-cu12==12.1.105
packaging==25.0
pillow==11.3.0
psutil==7.1.1
PyYAML==6.0.3
regex==2025.10.22
requests==2.32.5
safetensors==0.6.2
sympy==1.13.1
tokenizers==0.22.1
torch==2.5.1+cu121
torchaudio==2.5.1+cu121
torchvision==0.20.1+cu121
tqdm==4.67.1
transformers==4.57.1
triton==3.1.0
typing_extensions==4.15.0
urllib3==2.5.0
any news on this?