jaycha commited on
Commit
ed09f37
·
verified ·
1 Parent(s): 4499391

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +20 -18
README.md CHANGED
@@ -96,30 +96,32 @@ Please note that for certain benchmarks involving LLM-based evaluation (e.g., LL
96
  | InfoVQA_TEST | 66.9 | **71.7** | 65.0 |
97
  | ***AVERAGE*** | *70.5* | **71.7** | 68.3 |
98
 
99
- ### Cultural Benchmark
 
 
 
 
 
 
 
 
 
 
 
 
100
  | Benchmark | InternVL3-2B | Ovis2-2B | VARCO-VISION-2.0-1.7B |
101
  | :--------------: | :----------: | :------: | :-------------------: |
102
  | K-Viscuit | *60.0* | **64.1** | 57.7 |
103
  | PangeaBench (ko) | **66.2** | 63.1 | *63.8* |
104
- | PangeaBench | *58.4* | **59.2** | 56.3 |
105
-
106
- ### Text-only Benchmark
107
- | Benchmark | InternVL3-2B | Ovis2-2B | VARCO-VISION-2.0-1.7B |
108
- | :--------: | :----------: | :------: | :-------------------: |
109
- | MMLU | **59.9** | 12.9 | *55.3* |
110
- | MT-Bench | *6.28* | 6.14 | **7.23** |
111
- | KMMLU | **38.0** | *31.1* | 10.4 |
112
- | KoMT-Bench | 2.91 | *3.44* | **5.91** |
113
- | LogicKor | 2.56 | *3.12* | **5.37** |
114
-
115
- > **Note:** Some models show unusually low performance on the MMLU benchmark. This is primarily due to their failure to correctly follow the expected output format when only few-shot exemplars are provided in the prompts. Please take this into consideration when interpreting the results.
116
 
117
  ### OCR Benchmark
118
- | Benchmark | PaddleOCR | EasyOCR | VARCO-VISION-2.0-1.7B |
119
- | :-------: | :-------: | :-----: | :-------------------: |
120
- | CORD | *91.4* | 77.8 | **96.2** |
121
- | ICDAR2013 | *92.0* | 85.0 | **95.9** |
122
- | ICDAR2015 | **73.7** | 57.9 | **73.7** |
 
123
 
124
  ## Usage
125
  To use this model, we recommend installing `transformers` version **4.53.1 or higher**. While it may work with earlier versions, using **4.53.1 or above is strongly recommended**, especially to ensure optimal performance for the **multi-image feature**.
 
96
  | InfoVQA_TEST | 66.9 | **71.7** | 65.0 |
97
  | ***AVERAGE*** | *70.5* | **71.7** | 68.3 |
98
 
99
+ ### Text-only Benchmark
100
+ | Benchmark | InternVL3-2B | Ovis2-2B | VARCO-VISION-2.0-1.7B |
101
+ | :-------------: | :----------: | :------: | :-------------------: |
102
+ | MMLU | **59.9** | 12.9 | *55.3* |
103
+ | MT-Bench | *62.8* | 61.4 | **72.3** |
104
+ | KMMLU | **38.0** | *31.1* | 10.4 |
105
+ | KoMT-Bench | 29.1 | *34.4* | **59.1** |
106
+ | LogicKor | 25.6 | *31.2* | **53.7** |
107
+ | ***AVERAGE*** | *43.1* | 34.2 | **50.2** |
108
+
109
+ > **Note:** Some models show unusually low performance on the MMLU benchmark. This is primarily due to their failure to correctly follow the expected output format when only few-shot exemplars are provided in the prompts. Please take this into consideration when interpreting the results.
110
+
111
+ ### Korean Cultural Benchmark
112
  | Benchmark | InternVL3-2B | Ovis2-2B | VARCO-VISION-2.0-1.7B |
113
  | :--------------: | :----------: | :------: | :-------------------: |
114
  | K-Viscuit | *60.0* | **64.1** | 57.7 |
115
  | PangeaBench (ko) | **66.2** | 63.1 | *63.8* |
116
+ | ***AVERAGE*** | *63.1* | **63.6** | 60.8 |
 
 
 
 
 
 
 
 
 
 
 
117
 
118
  ### OCR Benchmark
119
+ | Benchmark | PaddleOCR | EasyOCR | VARCO-VISION-2.0-1.7B |
120
+ | :-----------: | :-------: | :-----: | :-------------------: |
121
+ | CORD | *91.4* | 77.8 | **96.2** |
122
+ | ICDAR2013 | *92.0* | 85.0 | **95.9** |
123
+ | ICDAR2015 | **73.7** | 57.9 | **73.7** |
124
+ | ***AVERAGE*** | *85.7* | 73.6 | **88.6** |
125
 
126
  ## Usage
127
  To use this model, we recommend installing `transformers` version **4.53.1 or higher**. While it may work with earlier versions, using **4.53.1 or above is strongly recommended**, especially to ensure optimal performance for the **multi-image feature**.