RuntimeError
input_conditioner.py", line 32, in forward
y = (x - self.norm_mean) / self.norm_std
RuntimeError: The size of tensor a (4) must match the size of tensor b (3) at non-singleton dimension 1
Using a slow image processor as use_fast is unset and a slow processor was saved with this model. use_fast=True will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with use_fast=False.
Floating point exception (core dumped)
Hi, thank you for your interest.
We've pushed a fix for the RGBA input:
RuntimeError: The size of tensor a (4) must match the size of tensor b (3) at non-singleton dimension 1
Could you provide more details or logs regarding the second issue?
Floating point exception (core dumped)
A new version of the following files was downloaded from https://huggingface.co/nvidia/NVIDIA-Nemotron-Parse-v1.1:
- hf_nemotron_parse_config.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
/root/autodl-tmp/conda/envs/nvidia-ocr/lib/python3.10/site-packages/timm/models/registry.py:4: FutureWarning: Importing from timm.models.registry is deprecated, please import via timm.models
warnings.warn(f"Importing from {name} is deprecated, please import via timm.models", FutureWarning)
/root/autodl-tmp/conda/envs/nvidia-ocr/lib/python3.10/site-packages/timm/models/layers/init.py:48: FutureWarning: Importing from timm.models.layers is deprecated, please import via timm.layers
warnings.warn(f"Importing from {name} is deprecated, please import via timm.layers", FutureWarning)
A new version of the following files was downloaded from https://huggingface.co/nvidia/NVIDIA-Nemotron-Parse-v1.1: - hf_nemotron_parse_modeling.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
No pretrained configuration specified for vit_huge_patch16_224 model. Using a default. Please add a config to the model pretrained_cfg registry or pass explicitly.
Some weights of the model checkpoint at nvidia/NVIDIA-Nemotron-Parse-v1.1 were not used when initializing NemotronParseForConditionalGeneration: ['decoder.embed_positions.weight'] - This IS expected if you are initializing NemotronParseForConditionalGeneration from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing NemotronParseForConditionalGeneration from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
hf_nemotron_parse_processor.py: 15.1kB [00:00, 5.43MB/s]
A new version of the following files was downloaded from https://huggingface.co/nvidia/NVIDIA-Nemotron-Parse-v1.1: - hf_nemotron_parse_processor.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
Using a slow image processor asuse_fastis unset and a slow processor was saved with this model.use_fast=Truewill be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor withuse_fast=False.
Floating point exception (core dumped)
This is the complete log output.
I see:
A.PadIfNeeded(
Traceback (most recent call last):
File "/home/XXX/nemotron-test/nemotron_minimal.py", line 37, in <module>
outputs = model.generate(**inputs, generation_config=generation_config)
File "/opt/venv/lib/python3.13/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
return func(*args, **kwargs)
File "/opt/venv/lib/python3.13/site-packages/transformers/generation/utils.py", line 2564, in generate
result = decoding_method(
self,
...<5 lines>...
**model_kwargs,
)
File "/opt/venv/lib/python3.13/site-packages/transformers/generation/utils.py", line 2784, in _sample
outputs = self(**model_inputs, return_dict=True)
File "/opt/venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
return forward_call(*args, **kwargs)
File "/home/XXX/.cache/huggingface/modules/transformers_modules/nvidia/NVIDIA_hyphen_Nemotron_hyphen_Parse_hyphen_v1_dot_1/ea21e1a91137ebb93cea097b5bb5d5e6448df391/hf_nemotron_parse_modeling.py", line 471, in forward
decoder_outputs = self.decoder(
input_ids=decoder_input_ids,
...<9 lines>...
**kwargs_decoder,
)
File "/opt/venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
return forward_call(*args, **kwargs)
File "/home/XXX/.cache/huggingface/modules/transformers_modules/nvidia/NVIDIA_hyphen_Nemotron_hyphen_Parse_hyphen_v1_dot_1/ea21e1a91137ebb93cea097b5bb5d5e6448df391/hf_nemotron_parse_modeling.py", line 167, in forward
when running a minimal example:
import torch
from PIL import Image
from transformers import (
AutoModel,
AutoProcessor,
AutoTokenizer,
GenerationConfig,
)
model_path = "nvidia/NVIDIA-Nemotron-Parse-v1.1"
device = "cuda:0"
model = (
AutoModel.from_pretrained(
model_path, trust_remote_code=True, torch_dtype=torch.bfloat16
)
.to(device)
.eval()
)
tokenizer = AutoTokenizer.from_pretrained(model_path)
processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True)
image = Image.open("/home/XXX/nemotron-test/dog.jpg")
task_prompt = "</s><s><predict_bbox><predict_classes><output_markdown>"
inputs = processor(images=[image], text=task_prompt, return_tensors="pt").to(device)
prompt_ids = processor.tokenizer.encode(
task_prompt, return_tensors="pt", add_special_tokens=False
).cuda()
generation_config = GenerationConfig.from_pretrained(model_path, trust_remote_code=True)
outputs = model.generate(**inputs, generation_config=generation_config)
print(outputs)
Any idea of how to solve this?
Hi @mrcrm9494 , could you please check if your environment matches the versions in requirements.txt (especially transformers/albumentations)? If this does not resolve it, could you please paste the whole error (including the Exception message)
hi, I independently tried (first with the -tc, then this, then found this issue). Can you guys please test running your code on an independent GPU machine? like, say, https://colab.research.google.com/notebooks/empty.ipynb and setting the runtime to GPU l4/a100 or so? There are numerous issues with your modeling code and documentation that would be resolved if you had tried running inference on a different machine from scratch..
after correcting your requirements.txt (missing open_clip_torch) and loading the model successfully with transformers/torch, I get a shape error from your code in hf_nemotron_parse_modeling.py:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
/tmp/ipython-input-3397576642.py in <cell line: 0>()
----> 1 text = image_to_markdown(image, model, processor, gen_config)
2 print(text)
10 frames
~/.cache/huggingface/modules/transformers_modules/nvidia/NVIDIA_hyphen_Nemotron_hyphen_Parse_hyphen_v1_dot_1/34a1a10bd0868b85d92ab82c564bb339d6f85d1c/hf_nemotron_parse_modeling.py in forward(self, input_ids, attention_mask, encoder_hidden_states, encoder_attention_mask, head_mask, cross_attn_head_mask, past_key_values, inputs_embeds, use_cache, output_attentions, output_hidden_states, return_dict)
165
166 # past_key_values_length
--> 167 past_key_values_length = past_key_values[0][0].shape[2] if past_key_values is not None else 0
168
169 if inputs_embeds is None:
AttributeError: 'NoneType' object has no attribute 'shape'