shizhediao2 commited on
Commit
0699dc2
·
1 Parent(s): e3d777f

update README

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -11,14 +11,14 @@
11
 
12
  Orchestrator-8B is a state-of-the-art 8B parameter orchestration model designed to solve complex, multi-turn agentic tasks by coordinating a diverse set of expert models and tools.
13
  <p align="center">
14
- <img src="./assets/method.png" width="100%"/>
15
  <p>
16
 
17
 
18
  On the Humanity's Last Exam (HLE) benchmark, ToolOrchestrator-8B achieves a score of 37.1%, outperforming GPT-5 (35.1%) while being approximately 2.5x more efficient.
19
 
20
  <p align="center">
21
- <img src="./assets/HLE_benchmark.png" width="80%"/>
22
  <p>
23
 
24
  This model is for research and development only.
@@ -35,12 +35,12 @@ This model is for research and development only.
35
  On Humanity’s Last Exam, Orchestrator-8B achieves 37.1%, surpassing GPT-5 (35.1%) with only 30% monetary cost and 2.5x faster. On FRAMES and τ²-Bench, Orchestrator-8B consistently outperforms strong monolithic systems, demonstrating versatile reasoning and robust tool orchestration.
36
 
37
  <p align="center">
38
- <img src="./assets/results.png" width="100%"/>
39
  <p>
40
 
41
  Orchestrator-8B consistently outperforms GPT-5, Claude Opus 4.1 and Qwen3-235B-A22B on HLE with substantially lower cost.
42
  <p align="center">
43
- <img src="./assets/cost-performance.png" width="100%"/>
44
  <p>
45
 
46
 
 
11
 
12
  Orchestrator-8B is a state-of-the-art 8B parameter orchestration model designed to solve complex, multi-turn agentic tasks by coordinating a diverse set of expert models and tools.
13
  <p align="center">
14
+ <img src="https://raw.githubusercontent.com/NVlabs/ToolOrchestra/main/assets/method.png" width="100%"/>
15
  <p>
16
 
17
 
18
  On the Humanity's Last Exam (HLE) benchmark, ToolOrchestrator-8B achieves a score of 37.1%, outperforming GPT-5 (35.1%) while being approximately 2.5x more efficient.
19
 
20
  <p align="center">
21
+ <img src="https://raw.githubusercontent.com/NVlabs/ToolOrchestra/main/assets/HLE_benchmark.png" width="80%"/>
22
  <p>
23
 
24
  This model is for research and development only.
 
35
  On Humanity’s Last Exam, Orchestrator-8B achieves 37.1%, surpassing GPT-5 (35.1%) with only 30% monetary cost and 2.5x faster. On FRAMES and τ²-Bench, Orchestrator-8B consistently outperforms strong monolithic systems, demonstrating versatile reasoning and robust tool orchestration.
36
 
37
  <p align="center">
38
+ <img src="https://raw.githubusercontent.com/NVlabs/ToolOrchestra/main/assets/results.png" width="100%"/>
39
  <p>
40
 
41
  Orchestrator-8B consistently outperforms GPT-5, Claude Opus 4.1 and Qwen3-235B-A22B on HLE with substantially lower cost.
42
  <p align="center">
43
+ <img src="https://raw.githubusercontent.com/NVlabs/ToolOrchestra/main/assets/cost_performance.png" width="100%"/>
44
  <p>
45
 
46