TFree model only runs with Transformers ≤4.46.3, crashes on newer versions (e.g. 4.57.1)

by NEWWWWWbie - opened Oct 17

Discussion

NEWWWWWbie

Oct 17

•

edited Oct 17

I’m hitting version compatibility issues whentrying to run the TFree model with the newer version of the transformers.

heres the error output

Traceback (most recent call last):
File "test.py", line 28, in
chat()
File "firewall.py", line 20, in init
self.tfree = TFreeFirewall()
File "tfree.py", line 19, in init
self.model = AutoModelForCausalLM.from_pretrained(...)
File ".../transformer_backbone.py", line 679, in init
self.self_attn = LLAMA_ATTENTION_CLASSES[config._attn_implementation](config=config, layer_idx[[=layer_idx)
KeyError: None

can i know if we have other ways to run it using the new version?

maxmeuer

Aleph Alpha org Oct 22

Hey @NEWWWWWbie ,
We originally pinned the version to ensure numerical consistency between the Rotary implementation in our training codebase and the one in transformers.

We’ve now updated the modeling code in all HATified LLaMA models, as well as our own pretrained HAT models, to be compatible again with the latest transformers package (4.57.1).

Let us know if it works for you now!

Side note: I’d encourage you to try our VLLM implementation of the HAT architecture. It’s significantly faster and supports batched inference.

NEWWWWWbie

Oct 23

Hi, after you changed the model, it won’t run in either the newer or the older version of the transformer now....

NEWWWWWbie

Oct 23

It will just get killed when running with the newest version of Transformers, and it will show an error when running with the older versions.

NEWWWWWbie

Oct 23

Other than that, is it possible to quantize this model to something like INT4 or another format to make it run faster?

maxmeuer

Aleph Alpha org Oct 23

I just ran everything again on my end with a new environment and fresh cache and everything seems to run as expected.

Could you provide the stack trace that you are seeing on your end and also post the exact packages versions via eg. pip freeze.

NEWWWWWbie

Oct 24

hi this is the pip freeze and the error when i run the code

torch_dtype is deprecated! Use dtype instead!
Loading checkpoint shards: 33%|███████████████████████████████████████████████████████████████████████████▋ | 1/3 [00:34<01:08, 34.10s/it]
Killed

accelerate==1.11.0
certifi==2025.10.5
charset-normalizer==3.4.4
einops==0.8.1
filelock==3.20.0
flash_attn==2.8.3
fsspec==2025.9.0
hat-splitter==0.1.10
hf-xet==1.1.10
huggingface-hub==0.35.3
idna==3.11
Jinja2==3.1.6
MarkupSafe==2.1.5
mpmath==1.3.0
networkx==3.3
numpy==2.2.6
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==9.1.0.70
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.21.5
nvidia-nvjitlink-cu12==12.9.86
nvidia-nvtx-cu12==12.1.105
packaging==25.0
pillow==11.3.0
psutil==7.1.1
PyYAML==6.0.3
regex==2025.10.22
requests==2.32.5
safetensors==0.6.2
sympy==1.13.1
tokenizers==0.22.1
torch==2.5.1+cu121
torchaudio==2.5.1+cu121
torchvision==0.20.1+cu121
tqdm==4.67.1
transformers==4.57.1
triton==3.1.0
typing_extensions==4.15.0
urllib3==2.5.0

NEWWWWWbie

23 days ago

any news on this?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment