Update README.md
Browse files
README.md
CHANGED
|
@@ -1,17 +1,19 @@
|
|
| 1 |
---
|
| 2 |
base_model: facebook/nougat-base
|
| 3 |
library_name: transformers
|
|
|
|
| 4 |
tags:
|
| 5 |
- generated_from_trainer
|
| 6 |
model-index:
|
| 7 |
- name: dhivehi-nougat-base
|
| 8 |
results: []
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
---
|
| 10 |
|
| 11 |
-
|
| 12 |
-
should probably proofread and complete it, then remove this comment. -->
|
| 13 |
-
|
| 14 |
-
# dhivehi-nougat-base
|
| 15 |
|
| 16 |
This model is a fine-tuned version of [facebook/nougat-base](https://huggingface.co/facebook/nougat-base) on the None dataset.
|
| 17 |
It achieves the following results on the evaluation set:
|
|
@@ -19,15 +21,52 @@ It achieves the following results on the evaluation set:
|
|
| 19 |
|
| 20 |
## Model description
|
| 21 |
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
##
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 31 |
|
| 32 |
## Training procedure
|
| 33 |
|
|
@@ -144,4 +183,4 @@ The following hyperparameters were used during training:
|
|
| 144 |
- Transformers 4.47.0
|
| 145 |
- Pytorch 2.6.0+cu124
|
| 146 |
- Datasets 3.2.0
|
| 147 |
-
- Tokenizers 0.21.0
|
|
|
|
| 1 |
---
|
| 2 |
base_model: facebook/nougat-base
|
| 3 |
library_name: transformers
|
| 4 |
+
license: cc-by-4.0
|
| 5 |
tags:
|
| 6 |
- generated_from_trainer
|
| 7 |
model-index:
|
| 8 |
- name: dhivehi-nougat-base
|
| 9 |
results: []
|
| 10 |
+
datasets:
|
| 11 |
+
- alakxender/dhivehi-image-text
|
| 12 |
+
language:
|
| 13 |
+
- dv
|
| 14 |
---
|
| 15 |
|
| 16 |
+
# DHIVEHI NOUGAT BASE (IMAGE-TO-TEXT)
|
|
|
|
|
|
|
|
|
|
| 17 |
|
| 18 |
This model is a fine-tuned version of [facebook/nougat-base](https://huggingface.co/facebook/nougat-base) on the None dataset.
|
| 19 |
It achieves the following results on the evaluation set:
|
|
|
|
| 21 |
|
| 22 |
## Model description
|
| 23 |
|
| 24 |
+
Finetuned dhivehi on text-image dataset, config all
|
| 25 |
+
|
| 26 |
+
## Usage
|
| 27 |
+
|
| 28 |
+
```python
|
| 29 |
+
from PIL import Image
|
| 30 |
+
import torch
|
| 31 |
+
from transformers import NougatProcessor, VisionEncoderDecoderModel
|
| 32 |
+
from pathlib import Path
|
| 33 |
+
|
| 34 |
+
# Load the model and processor
|
| 35 |
+
processor = NougatProcessor.from_pretrained("alakxender/dhivehi-nougat-base")
|
| 36 |
+
model = VisionEncoderDecoderModel.from_pretrained(
|
| 37 |
+
"alakxender/dhivehi-nougat-base",
|
| 38 |
+
torch_dtype=torch.bfloat16, # Optional: Load the model with BF16 data type for faster inference and lower memory usage
|
| 39 |
+
attn_implementation={ # Optional: Specify the attention kernel implementations for different parts of the model
|
| 40 |
+
"decoder": "flash_attention_2", # Use FlashAttention-2 for the decoder for improved performance
|
| 41 |
+
"encoder": "eager" # Use the default ("eager") attention implementation for the encoder
|
| 42 |
+
}
|
| 43 |
+
)
|
| 44 |
+
|
| 45 |
+
device = "cuda" if torch.cuda.is_available() else "cpu"
|
| 46 |
+
model.to(device)
|
| 47 |
+
|
| 48 |
+
context_length = 128
|
| 49 |
+
|
| 50 |
+
def predict(img_path):
|
| 51 |
+
# Ensure image is in RGB format
|
| 52 |
+
image = Image.open(img_path).convert("RGB")
|
| 53 |
+
pixel_values = processor(image, return_tensors="pt").pixel_values.to(torch.bfloat16)
|
| 54 |
+
|
| 55 |
+
# generate prediction
|
| 56 |
+
outputs = model.generate(
|
| 57 |
+
pixel_values.to(device),
|
| 58 |
+
min_length=1,
|
| 59 |
+
max_new_tokens=context_length,
|
| 60 |
+
repetition_penalty=1.5,
|
| 61 |
+
bad_words_ids=[[processor.tokenizer.unk_token_id]],
|
| 62 |
+
eos_token_id=processor.tokenizer.eos_token_id,
|
| 63 |
+
)
|
| 64 |
+
|
| 65 |
+
page_sequence = processor.batch_decode(outputs, skip_special_tokens=True)[0]
|
| 66 |
+
return page_sequence
|
| 67 |
+
|
| 68 |
+
print(predict("DV01-04_31.jpg"))
|
| 69 |
+
```
|
| 70 |
|
| 71 |
## Training procedure
|
| 72 |
|
|
|
|
| 183 |
- Transformers 4.47.0
|
| 184 |
- Pytorch 2.6.0+cu124
|
| 185 |
- Datasets 3.2.0
|
| 186 |
+
- Tokenizers 0.21.0
|