Open-Orca
/

OpenOrca-Platypus2-13B

Text Generation

text-generation-inference

Model card Files Files and versions

bleysg commited on Aug 13, 2023

Commit

66cd022

·

1 Parent(s): 9724828

Update README.md

Files changed (1) hide show

README.md +16 -1

README.md CHANGED Viewed

@@ -34,6 +34,8 @@ https://AlignmentLab.ai
 # Evaluation
 ![HF Leaderboard](https://huggingface.co/Open-Orca/OpenOrca-Platypus2-13B/resolve/main/Images/OrcaPlatypus13BHFLeaderboard.webp)
 | Metric                | Value |
@@ -47,6 +49,19 @@ https://AlignmentLab.ai
 We use [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) to run the benchmark tests above, using the same version as the HuggingFace LLM Leaderboard. Please see below for detailed instructions on reproducing benchmark results.
 # Model Details
 * **Trained by**: **Platypus2-13B** trained by Cole Hunter & Ariel Lee; **OpenOrcaxOpenChat-Preview2-13B** trained by Open-Orca
@@ -87,7 +102,7 @@ Please see our [paper](https://platypus-llm.github.io/Platypus.pdf) and [project
 For training details and inference instructions please see the [Platypus](https://github.com/arielnlee/Platypus) GitHub repo.
-# Reproducing Evaluation Results
 Install LM Evaluation Harness:
 ```

 # Evaluation
+## HuggingFace Leaderboard Performance
 ![HF Leaderboard](https://huggingface.co/Open-Orca/OpenOrca-Platypus2-13B/resolve/main/Images/OrcaPlatypus13BHFLeaderboard.webp)
 | Metric                | Value |
 We use [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) to run the benchmark tests above, using the same version as the HuggingFace LLM Leaderboard. Please see below for detailed instructions on reproducing benchmark results.
+## AGIEval Performance
+We compare our results to our base Preview2 model, and find **112%** of the base model's performance on AGI Eval, averaging **0.463**.
+![OpenOrca-Platypus2-13B AGIEval Performance](https://huggingface.co/Open-Orca/OpenOrca-Platypus2-13B/resolve/main/Images/OrcaPlatypus13BAGIEval.webp "AGIEval Performance")
+## BigBench-Hard Performance
+We compare our results to our base Preview2 model, and find **105%** of the base model's performance on BigBench-Hard, averaging **0.442**.
+![OpenOrca-Platypus2-13B BigBench-Hard Performance](https://huggingface.co/Open-Orca/OpenOrca-Platypus2-13B/resolve/main/Images/OrcaPlatypus13BBigBenchHard.webp "BigBench-Hard Performance")
 # Model Details
 * **Trained by**: **Platypus2-13B** trained by Cole Hunter & Ariel Lee; **OpenOrcaxOpenChat-Preview2-13B** trained by Open-Orca
 For training details and inference instructions please see the [Platypus](https://github.com/arielnlee/Platypus) GitHub repo.
+# Reproducing Evaluation Results (for HuggingFace Leaderboard Eval)
 Install LM Evaluation Harness:
 ```