Commit
·
0699dc2
1
Parent(s):
e3d777f
update README
Browse files
README.md
CHANGED
|
@@ -11,14 +11,14 @@
|
|
| 11 |
|
| 12 |
Orchestrator-8B is a state-of-the-art 8B parameter orchestration model designed to solve complex, multi-turn agentic tasks by coordinating a diverse set of expert models and tools.
|
| 13 |
<p align="center">
|
| 14 |
-
<img src="
|
| 15 |
<p>
|
| 16 |
|
| 17 |
|
| 18 |
On the Humanity's Last Exam (HLE) benchmark, ToolOrchestrator-8B achieves a score of 37.1%, outperforming GPT-5 (35.1%) while being approximately 2.5x more efficient.
|
| 19 |
|
| 20 |
<p align="center">
|
| 21 |
-
<img src="
|
| 22 |
<p>
|
| 23 |
|
| 24 |
This model is for research and development only.
|
|
@@ -35,12 +35,12 @@ This model is for research and development only.
|
|
| 35 |
On Humanity’s Last Exam, Orchestrator-8B achieves 37.1%, surpassing GPT-5 (35.1%) with only 30% monetary cost and 2.5x faster. On FRAMES and τ²-Bench, Orchestrator-8B consistently outperforms strong monolithic systems, demonstrating versatile reasoning and robust tool orchestration.
|
| 36 |
|
| 37 |
<p align="center">
|
| 38 |
-
<img src="
|
| 39 |
<p>
|
| 40 |
|
| 41 |
Orchestrator-8B consistently outperforms GPT-5, Claude Opus 4.1 and Qwen3-235B-A22B on HLE with substantially lower cost.
|
| 42 |
<p align="center">
|
| 43 |
-
<img src="
|
| 44 |
<p>
|
| 45 |
|
| 46 |
|
|
|
|
| 11 |
|
| 12 |
Orchestrator-8B is a state-of-the-art 8B parameter orchestration model designed to solve complex, multi-turn agentic tasks by coordinating a diverse set of expert models and tools.
|
| 13 |
<p align="center">
|
| 14 |
+
<img src="https://raw.githubusercontent.com/NVlabs/ToolOrchestra/main/assets/method.png" width="100%"/>
|
| 15 |
<p>
|
| 16 |
|
| 17 |
|
| 18 |
On the Humanity's Last Exam (HLE) benchmark, ToolOrchestrator-8B achieves a score of 37.1%, outperforming GPT-5 (35.1%) while being approximately 2.5x more efficient.
|
| 19 |
|
| 20 |
<p align="center">
|
| 21 |
+
<img src="https://raw.githubusercontent.com/NVlabs/ToolOrchestra/main/assets/HLE_benchmark.png" width="80%"/>
|
| 22 |
<p>
|
| 23 |
|
| 24 |
This model is for research and development only.
|
|
|
|
| 35 |
On Humanity’s Last Exam, Orchestrator-8B achieves 37.1%, surpassing GPT-5 (35.1%) with only 30% monetary cost and 2.5x faster. On FRAMES and τ²-Bench, Orchestrator-8B consistently outperforms strong monolithic systems, demonstrating versatile reasoning and robust tool orchestration.
|
| 36 |
|
| 37 |
<p align="center">
|
| 38 |
+
<img src="https://raw.githubusercontent.com/NVlabs/ToolOrchestra/main/assets/results.png" width="100%"/>
|
| 39 |
<p>
|
| 40 |
|
| 41 |
Orchestrator-8B consistently outperforms GPT-5, Claude Opus 4.1 and Qwen3-235B-A22B on HLE with substantially lower cost.
|
| 42 |
<p align="center">
|
| 43 |
+
<img src="https://raw.githubusercontent.com/NVlabs/ToolOrchestra/main/assets/cost_performance.png" width="100%"/>
|
| 44 |
<p>
|
| 45 |
|
| 46 |
|