nvidia
/

NVIDIA-Nemotron-Parse-v1.1

@@ -11,7 +11,7 @@ tags:
 - OCR
 ---
-# NVIDIA Nemotron Parse 1.1 Overview
 NVIDIA Nemotron Parse v1.1 is designed to understand document semantics and extract text and tables elements with spatial grounding. Given an image, NVIDIA Nemotron Parse v1.1 produces structured annotations, including formatted text, bounding-boxes and the corresponding semantic classes, ordered according to the document's reading flow. It overcomes the shortcomings of traditional OCR technologies that struggle with complex document layouts with structural variability, and helps transform unstructured documents into actionable and machine-usable representations. This has several downstream benefits such as increasing the availability of training-data for Large Language Models (LLMs), improving the accuracy of extractor, curator, retriever and AI agentic applications, and enhancing document understanding pipelines.
@@ -25,8 +25,8 @@ GOVERNING TERMS: The NIM container is governed by the [NVIDIA Software License A
 Global
 ## Use Case:
-NVIDIA Nemotron Parse 1.1 will be capable of comprehensive text understanding and document structure understanding. It will be used in retriever and curator solutions. Its text extraction datasets and capabilities will help with LLM and VLM training, as well as improve run-time inference accuracy of VLMs.
-The NVIDIA Nemotron Parse 1.1 model will perform text extraction from PDF and PPT documents. The NVIDIA Nemotron Parse 1.1 can classify the objects (title, section, caption, index, footnote, lists, tables, bibliography, image) in a given document, and provide bounding boxes with coordinates.
 ## Release Date:
@@ -70,7 +70,7 @@ Carbon Emissions: 3.21 tCO2e <br>
 * Output Format: String
 * Output Parameters: 1D
 - Other Properties Related to Output:
-  - NVIDIA Nemotron Parse 1.1 output format is a string which encodes text content (formatted or not) as well as bounding boxes and class attributes.<br>
  In the default prompt setting, text content is represented as markdown, and math expressions as LaTeX, enclosed in \[..\] or \(..\). If a mathematical expression does not require LaTeX formatting to be represented (e.g., consisting only of characters and subscripts/superscripts), it is represented as markdown. Tables are represented as LaTeX.
   Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.<br>

 - OCR
 ---
+# NVIDIA Nemotron Parse v1.1 Overview
 NVIDIA Nemotron Parse v1.1 is designed to understand document semantics and extract text and tables elements with spatial grounding. Given an image, NVIDIA Nemotron Parse v1.1 produces structured annotations, including formatted text, bounding-boxes and the corresponding semantic classes, ordered according to the document's reading flow. It overcomes the shortcomings of traditional OCR technologies that struggle with complex document layouts with structural variability, and helps transform unstructured documents into actionable and machine-usable representations. This has several downstream benefits such as increasing the availability of training-data for Large Language Models (LLMs), improving the accuracy of extractor, curator, retriever and AI agentic applications, and enhancing document understanding pipelines.
 Global
 ## Use Case:
+NVIDIA Nemotron Parse v1.1 will be capable of comprehensive text understanding and document structure understanding. It will be used in retriever and curator solutions. Its text extraction datasets and capabilities will help with LLM and VLM training, as well as improve run-time inference accuracy of VLMs.
+The NVIDIA Nemotron Parse v1.1 model will perform text extraction from PDF and PPT documents. The NVIDIA Nemotron Parse v1.1 can classify the objects (title, section, caption, index, footnote, lists, tables, bibliography, image) in a given document, and provide bounding boxes with coordinates.
 ## Release Date:
 * Output Format: String
 * Output Parameters: 1D
 - Other Properties Related to Output:
+  - NVIDIA Nemotron Parse v1.1 output format is a string which encodes text content (formatted or not) as well as bounding boxes and class attributes.<br>
  In the default prompt setting, text content is represented as markdown, and math expressions as LaTeX, enclosed in \[..\] or \(..\). If a mathematical expression does not require LaTeX formatting to be represented (e.g., consisting only of characters and subscripts/superscripts), it is represented as markdown. Tables are represented as LaTeX.
   Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.<br>