nvidia
/

NVIDIA-Nemotron-Parse-v1.1

@@ -11,15 +11,12 @@ tags:
 - OCR
 ---
-# nemotron-parse Overview
-nemotron-parse is a general purpose text-extraction model, specifically designed to handle documents. Given an image, nemotron-parse is able to extract formatted-text, with bounding-boxes and the corresponding semantic class. This has downstream benefits for several tasks such as increasing the availability of training-data for Large Language Models (LLMs), improving the accuracy of retriever systems, and enhancing document understanding pipelines.
 This model is ready for commercial use.
 ## License
 GOVERNING TERMS: The NIM container is governed by the [NVIDIA Software License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-software-license-agreement/) and [Product-Specific Terms for NVIDIA AI Products](https://www.nvidia.com/en-us/agreements/enterprise-software/product-specific-terms-for-ai-products/). Use of this model is governed by the [NVIDIA Community Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-community-models-license/). Use of the tokenizer included in this model is governed by the [CC-BY-4.0 license](https://creativecommons.org/licenses/by/4.0/).
@@ -28,8 +25,8 @@ GOVERNING TERMS: The NIM container is governed by the [NVIDIA Software License A
 Global
 ## Use Case:
-nemotron-parse will be capable of comprehensive text understanding and document structure understanding. It will be used in retriever and curator solutions. Its text extraction datasets and capabilities will help with LLM and VLM training, as well as improve run-time inference accuracy of VLMs.
-The nemotron-parse model will perform text extraction from PDF and PPT documents. The nemotron-parse can classify the objects (title, section, caption, index, footnote, lists, tables, bibliography, image) in a given document, and provide bounding boxes with coordinates.
 ## Release Date:
@@ -73,7 +70,7 @@ Carbon Emissions: 3.21 tCO2e <br>
 * Output Format: String
 * Output Parameters: 1D
 - Other Properties Related to Output:
-  - nemotron-parse output format is a string which encodes text content (formatted or not) as well as bounding boxes and class attributes.<br>
  In the default prompt setting, text content is represented as markdown, and math expressions as LaTeX, enclosed in \[..\] or \(..\). If a mathematical expression does not require LaTeX formatting to be represented (e.g., consisting only of characters and subscripts/superscripts), it is represented as markdown. Tables are represented as LaTeX.
   Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.<br>
@@ -228,7 +225,7 @@ Nemotron-Parse-v1.1 is also available as an [optimized NIM container](https://bu
 ### Training Dataset
-nemotron-parse is first pre-trained on our internal datasets: human, synthetic and automated.
 Data Modality:
 *Text
 *Image<br>
@@ -237,7 +234,7 @@ Labeling Method by Dataset: Hybrid: Human, Synthetic, Automated
 ### Testing and Evaluation Dataset:
-nemotron-parse is evaluated on multiple datasets for robustness, including public and internal dataset.
 Data Collection Method by Dataset: Hybrid: Human, Synthetic, Automated
 Labeling Method by Dataset: Hybrid: Human, Synthetic, Automated

 - OCR
 ---
+# NVIDIA Nemotron Parse 1.1 Overview
+NVIDIA Nemotron Parse 1.1 is a general purpose text-extraction model, specifically designed to handle documents, and extracting text, table, and understanding document semantics. Given an image, NVIDIA Nemotron Parse 1.1 is able to extract formatted text, bounding-boxes and the corresponding semantic classes. This has downstream benefits for several tasks such as increasing the availability of training-data for Large Language Models (LLMs), improving the accuracy of retriever systems, and enhancing document understanding pipelines.
 This model is ready for commercial use.
 ## License
 GOVERNING TERMS: The NIM container is governed by the [NVIDIA Software License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-software-license-agreement/) and [Product-Specific Terms for NVIDIA AI Products](https://www.nvidia.com/en-us/agreements/enterprise-software/product-specific-terms-for-ai-products/). Use of this model is governed by the [NVIDIA Community Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-community-models-license/). Use of the tokenizer included in this model is governed by the [CC-BY-4.0 license](https://creativecommons.org/licenses/by/4.0/).
 Global
 ## Use Case:
+NVIDIA Nemotron Parse 1.1 will be capable of comprehensive text understanding and document structure understanding. It will be used in retriever and curator solutions. Its text extraction datasets and capabilities will help with LLM and VLM training, as well as improve run-time inference accuracy of VLMs.
+The NVIDIA Nemotron Parse 1.1 model will perform text extraction from PDF and PPT documents. The NVIDIA Nemotron Parse 1.1 can classify the objects (title, section, caption, index, footnote, lists, tables, bibliography, image) in a given document, and provide bounding boxes with coordinates.
 ## Release Date:
 * Output Format: String
 * Output Parameters: 1D
 - Other Properties Related to Output:
+  - NVIDIA Nemotron Parse 1.1 output format is a string which encodes text content (formatted or not) as well as bounding boxes and class attributes.<br>
  In the default prompt setting, text content is represented as markdown, and math expressions as LaTeX, enclosed in \[..\] or \(..\). If a mathematical expression does not require LaTeX formatting to be represented (e.g., consisting only of characters and subscripts/superscripts), it is represented as markdown. Tables are represented as LaTeX.
   Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.<br>
 ### Training Dataset
+NVIDIA Nemotron Parse 1.1 is first pre-trained on our internal datasets: human, synthetic and automated.
 Data Modality:
 *Text
 *Image<br>
 ### Testing and Evaluation Dataset:
+NVIDIA Nemotron Parse 1.1 is evaluated on multiple datasets for robustness, including public and internal dataset.
 Data Collection Method by Dataset: Hybrid: Human, Synthetic, Automated
 Labeling Method by Dataset: Hybrid: Human, Synthetic, Automated