RuntimeError

#1
by shqu - opened

input_conditioner.py", line 32, in forward
y = (x - self.norm_mean) / self.norm_std
RuntimeError: The size of tensor a (4) must match the size of tensor b (3) at non-singleton dimension 1

Using a slow image processor as use_fast is unset and a slow processor was saved with this model. use_fast=True will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with use_fast=False.
Floating point exception (core dumped)

NVIDIA org

Hi, thank you for your interest.
We've pushed a fix for the RGBA input:

RuntimeError: The size of tensor a (4) must match the size of tensor b (3) at non-singleton dimension 1

Could you provide more details or logs regarding the second issue?

Floating point exception (core dumped)

A new version of the following files was downloaded from https://huggingface.co/nvidia/NVIDIA-Nemotron-Parse-v1.1:

  • hf_nemotron_parse_config.py
    . Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
    /root/autodl-tmp/conda/envs/nvidia-ocr/lib/python3.10/site-packages/timm/models/registry.py:4: FutureWarning: Importing from timm.models.registry is deprecated, please import via timm.models
    warnings.warn(f"Importing from {name} is deprecated, please import via timm.models", FutureWarning)
    /root/autodl-tmp/conda/envs/nvidia-ocr/lib/python3.10/site-packages/timm/models/layers/init.py:48: FutureWarning: Importing from timm.models.layers is deprecated, please import via timm.layers
    warnings.warn(f"Importing from {name} is deprecated, please import via timm.layers", FutureWarning)
    A new version of the following files was downloaded from https://huggingface.co/nvidia/NVIDIA-Nemotron-Parse-v1.1:
  • hf_nemotron_parse_modeling.py
    . Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
    No pretrained configuration specified for vit_huge_patch16_224 model. Using a default. Please add a config to the model pretrained_cfg registry or pass explicitly.
    Some weights of the model checkpoint at nvidia/NVIDIA-Nemotron-Parse-v1.1 were not used when initializing NemotronParseForConditionalGeneration: ['decoder.embed_positions.weight']
  • This IS expected if you are initializing NemotronParseForConditionalGeneration from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
  • This IS NOT expected if you are initializing NemotronParseForConditionalGeneration from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
    hf_nemotron_parse_processor.py: 15.1kB [00:00, 5.43MB/s]
    A new version of the following files was downloaded from https://huggingface.co/nvidia/NVIDIA-Nemotron-Parse-v1.1:
  • hf_nemotron_parse_processor.py
    . Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
    Using a slow image processor as use_fast is unset and a slow processor was saved with this model. use_fast=True will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with use_fast=False.
    Floating point exception (core dumped)

This is the complete log output.

I see:

  A.PadIfNeeded(
Traceback (most recent call last):
  File "/home/XXX/nemotron-test/nemotron_minimal.py", line 37, in <module>
    outputs = model.generate(**inputs, generation_config=generation_config)
  File "/opt/venv/lib/python3.13/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
  File "/opt/venv/lib/python3.13/site-packages/transformers/generation/utils.py", line 2564, in generate
    result = decoding_method(
        self,
    ...<5 lines>...
        **model_kwargs,
    )
  File "/opt/venv/lib/python3.13/site-packages/transformers/generation/utils.py", line 2784, in _sample
    outputs = self(**model_inputs, return_dict=True)
  File "/opt/venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/XXX/.cache/huggingface/modules/transformers_modules/nvidia/NVIDIA_hyphen_Nemotron_hyphen_Parse_hyphen_v1_dot_1/ea21e1a91137ebb93cea097b5bb5d5e6448df391/hf_nemotron_parse_modeling.py", line 471, in forward
    decoder_outputs = self.decoder(
        input_ids=decoder_input_ids,
    ...<9 lines>...
        **kwargs_decoder,
    )
  File "/opt/venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/XXX/.cache/huggingface/modules/transformers_modules/nvidia/NVIDIA_hyphen_Nemotron_hyphen_Parse_hyphen_v1_dot_1/ea21e1a91137ebb93cea097b5bb5d5e6448df391/hf_nemotron_parse_modeling.py", line 167, in forward

when running a minimal example:

import torch
from PIL import Image
from transformers import (
    AutoModel,
    AutoProcessor,
    AutoTokenizer,
    GenerationConfig,
)

model_path = "nvidia/NVIDIA-Nemotron-Parse-v1.1" 
device = "cuda:0"

model = (
    AutoModel.from_pretrained(
        model_path, trust_remote_code=True, torch_dtype=torch.bfloat16
    )
    .to(device)
    .eval()
)
tokenizer = AutoTokenizer.from_pretrained(model_path)
processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True)

image = Image.open("/home/XXX/nemotron-test/dog.jpg")
task_prompt = "</s><s><predict_bbox><predict_classes><output_markdown>"

inputs = processor(images=[image], text=task_prompt, return_tensors="pt").to(device)
prompt_ids = processor.tokenizer.encode(
    task_prompt, return_tensors="pt", add_special_tokens=False
).cuda()

generation_config = GenerationConfig.from_pretrained(model_path, trust_remote_code=True)
outputs = model.generate(**inputs, generation_config=generation_config)
print(outputs)

Any idea of how to solve this?

NVIDIA org

Hi @mrcrm9494 , could you please check if your environment matches the versions in requirements.txt (especially transformers/albumentations)? If this does not resolve it, could you please paste the whole error (including the Exception message)

hi, I independently tried (first with the -tc, then this, then found this issue). Can you guys please test running your code on an independent GPU machine? like, say, https://colab.research.google.com/notebooks/empty.ipynb and setting the runtime to GPU l4/a100 or so? There are numerous issues with your modeling code and documentation that would be resolved if you had tried running inference on a different machine from scratch..

after correcting your requirements.txt (missing open_clip_torch) and loading the model successfully with transformers/torch, I get a shape error from your code in hf_nemotron_parse_modeling.py:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/tmp/ipython-input-3397576642.py in <cell line: 0>()
----> 1 text = image_to_markdown(image, model, processor, gen_config)
      2 print(text)

10 frames
~/.cache/huggingface/modules/transformers_modules/nvidia/NVIDIA_hyphen_Nemotron_hyphen_Parse_hyphen_v1_dot_1/34a1a10bd0868b85d92ab82c564bb339d6f85d1c/hf_nemotron_parse_modeling.py in forward(self, input_ids, attention_mask, encoder_hidden_states, encoder_attention_mask, head_mask, cross_attn_head_mask, past_key_values, inputs_embeds, use_cache, output_attentions, output_hidden_states, return_dict)
    165 
    166         # past_key_values_length
--> 167         past_key_values_length = past_key_values[0][0].shape[2] if past_key_values is not None else 0
    168 
    169         if inputs_embeds is None:

AttributeError: 'NoneType' object has no attribute 'shape'

Sign up or log in to comment