---
base_model: facebook/nougat-small
library_name: transformers
license: cc-by-4.0
tags:
- generated_from_trainer
model-index:
- name: dhivehi-nougat-small-dv01-01
  results: []
datasets:
- alakxender/dhivehi-image-text
language:
- dv
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# DHIVEHI NOUGAT SMALL (IMAGE-TO-TEXT)

This model is a fine-tuned version of [facebook/nougat-small](https://huggingface.co/facebook/nougat-small) on an dhivehi-text-image dataset.
It achieves the following results on the evaluation set:
- Loss: 0.0300

## Model description

Finetuned dhivehi on text-image dataset, config dv-01-01 only

## Usage

```python
from PIL import Image
import torch
from transformers import NougatProcessor, VisionEncoderDecoderModel
from pathlib import Path

# Load the model and processor
processor = NougatProcessor.from_pretrained("alakxender/dhivehi-nougat-small-dv01-01")
model = VisionEncoderDecoderModel.from_pretrained(
    "alakxender/dhivehi-nougat-small-dv01-01",  
    torch_dtype=torch.bfloat16,                 # Optional: Load the model with BF16 data type for faster inference and lower memory usage
    attn_implementation={                       # Optional: Specify the attention kernel implementations for different parts of the model
        "decoder": "flash_attention_2",         # Use FlashAttention-2 for the decoder for improved performance
        "encoder": "eager"                      # Use the default ("eager") attention implementation for the encoder
    }
)

device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)

context_length = 128

def predict(img_path):
    # Ensure image is in RGB format
    image = Image.open(img_path).convert("RGB")  
    pixel_values = processor(image, return_tensors="pt").pixel_values.to(torch.bfloat16)

    # generate prediction
    outputs = model.generate(
        pixel_values.to(device),
        min_length=1,
        max_new_tokens=context_length,
        repetition_penalty=1.5,
        bad_words_ids=[[processor.tokenizer.unk_token_id]],
        eos_token_id=processor.tokenizer.eos_token_id,
    )

    page_sequence = processor.batch_decode(outputs, skip_special_tokens=True)[0]
    return page_sequence

print(predict("DV01-04_31.jpg"))
```

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 3
- eval_batch_size: 3
- seed: 42
- gradient_accumulation_steps: 6
- total_train_batch_size: 18
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- num_epochs: 100

### Training results

| Training Loss | Epoch  | Step | Validation Loss |
|:-------------:|:------:|:----:|:---------------:|
| 7.1462        | 0.0567 | 100  | 1.1326          |
| 6.5572        | 0.1135 | 200  | 1.0543          |
| 6.1831        | 0.1702 | 300  | 0.9868          |
| 6.0022        | 0.2269 | 400  | 0.9323          |
| 5.6527        | 0.2837 | 500  | 0.8896          |
| 5.5004        | 0.3404 | 600  | 0.8478          |
| 5.2741        | 0.3971 | 700  | 0.8168          |
| 4.9927        | 0.4539 | 800  | 0.7466          |
| 4.3776        | 0.5106 | 900  | 0.6724          |
| 2.816         | 0.5673 | 1000 | 0.4038          |
| 1.8526        | 0.6241 | 1100 | 0.2720          |
| 1.5099        | 0.6808 | 1200 | 0.2064          |
| 1.3084        | 0.7375 | 1300 | 0.1696          |
| 1.1449        | 0.7943 | 1400 | 0.1516          |
| 0.8819        | 0.8510 | 1500 | 0.1331          |
| 0.7947        | 0.9077 | 1600 | 0.1194          |
| 0.9857        | 0.9644 | 1700 | 0.1091          |
| 0.7097        | 1.0210 | 1800 | 0.1023          |
| 0.5212        | 1.0777 | 1900 | 0.0953          |
| 0.6396        | 1.1345 | 2000 | 0.0882          |
| 0.6073        | 1.1912 | 2100 | 0.0863          |
| 0.5683        | 1.2479 | 2200 | 0.0815          |
| 0.5399        | 1.3047 | 2300 | 0.0770          |
| 0.5433        | 1.3614 | 2400 | 0.0740          |
| 0.5824        | 1.4181 | 2500 | 0.0688          |
| 0.447         | 1.4748 | 2600 | 0.0665          |
| 0.4875        | 1.5316 | 2700 | 0.0633          |
| 0.4694        | 1.5883 | 2800 | 0.0616          |
| 0.4001        | 1.6450 | 2900 | 0.0580          |
| 0.3971        | 1.7018 | 3000 | 0.0585          |
| 0.3889        | 1.7585 | 3100 | 0.0556          |
| 0.3088        | 1.8152 | 3200 | 0.0546          |
| 0.3476        | 1.8720 | 3300 | 0.0522          |
| 0.4569        | 1.9287 | 3400 | 0.0513          |
| 0.3979        | 1.9854 | 3500 | 0.0502          |
| 0.2847        | 2.0420 | 3600 | 0.0486          |
| 0.4332        | 2.0987 | 3700 | 0.0465          |
| 0.3647        | 2.1554 | 3800 | 0.0469          |
| 0.3791        | 2.2122 | 3900 | 0.0459          |
| 0.2982        | 2.2689 | 4000 | 0.0450          |
| 0.3294        | 2.3256 | 4100 | 0.0447          |
| 0.2839        | 2.3824 | 4200 | 0.0434          |
| 0.3094        | 2.4391 | 4300 | 0.0433          |
| 0.3062        | 2.4958 | 4400 | 0.0422          |
| 0.2723        | 2.5526 | 4500 | 0.0412          |
| 0.2348        | 2.6093 | 4600 | 0.0406          |
| 0.2125        | 2.6660 | 4700 | 0.0403          |
| 0.3172        | 2.7228 | 4800 | 0.0385          |
| 0.2315        | 2.7795 | 4900 | 0.0382          |
| 0.2707        | 2.8362 | 5000 | 0.0385          |
| 0.2391        | 2.8930 | 5100 | 0.0373          |
| 0.2979        | 2.9497 | 5200 | 0.0372          |
| 0.2933        | 3.0062 | 5300 | 0.0362          |
| 0.2388        | 3.0630 | 5400 | 0.0357          |
| 0.2525        | 3.1197 | 5500 | 0.0364          |
| 0.2563        | 3.1764 | 5600 | 0.0359          |
| 0.2534        | 3.2332 | 5700 | 0.0354          |
| 0.2401        | 3.2899 | 5800 | 0.0344          |
| 0.2116        | 3.3466 | 5900 | 0.0340          |
| 0.2713        | 3.4034 | 6000 | 0.0340          |
| 0.2351        | 3.4601 | 6100 | 0.0333          |
| 0.1471        | 3.5168 | 6200 | 0.0335          |
| 0.2209        | 3.5736 | 6300 | 0.0326          |
| 0.2206        | 3.6303 | 6400 | 0.0324          |
| 0.2208        | 3.6870 | 6500 | 0.0316          |
| 0.2329        | 3.7438 | 6600 | 0.0316          |
| 0.1439        | 3.8005 | 6700 | 0.0312          |
| 0.2335        | 3.8572 | 6800 | 0.0315          |
| 0.1582        | 3.9140 | 6900 | 0.0312          |
| 0.2298        | 3.9707 | 7000 | 0.0305          |
| 0.1649        | 4.0272 | 7100 | 0.0309          |
| 0.1489        | 4.0840 | 7200 | 0.0304          |
| 0.1729        | 4.1407 | 7300 | 0.0304          |
| 0.1907        | 4.1974 | 7400 | 0.0297          |
| 0.2           | 4.2542 | 7500 | 0.0298          |
| 0.1776        | 4.3109 | 7600 | 0.0296          |
| 0.1955        | 4.3676 | 7700 | 0.0292          |
| 0.1838        | 4.4244 | 7800 | 0.0295          |
| 0.1685        | 4.4811 | 7900 | 0.0292          |
| 0.161         | 4.5378 | 8000 | 0.0300          |


### Framework versions

- Transformers 4.47.0
- Pytorch 2.6.0+cu124
- Datasets 3.2.0
- Tokenizers 0.21.0