SeerAttention commited on
Commit
a0449f9
·
verified ·
1 Parent(s): 17ca565

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -1
README.md CHANGED
@@ -62,4 +62,27 @@ With threshold set to 2e-3.
62
  | 16k | 92.01 | 92.02 | 0.56 |
63
  | 32k | 87.63 | 88.49 | 0.46 |
64
  | 64k | 84.39 | 83.48 | 0.32 |
65
- | 128k | 76.26 | 73.37 | 0.17 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
62
  | 16k | 92.01 | 92.02 | 0.56 |
63
  | 32k | 87.63 | 88.49 | 0.46 |
64
  | 64k | 84.39 | 83.48 | 0.32 |
65
+ | 128k | 76.26 | 73.37 | 0.17 |
66
+
67
+
68
+
69
+ ## LongBenchV2 CoT Benchmark
70
+
71
+ All the SeerAttention models run with threshold=5e-4.
72
+
73
+ For R1-Distilled models, we remove the two passes generation setup (think + summary), we directly ask the models to output anwser after thinking. The generation max length is set to 10240.
74
+
75
+
76
+
77
+ | Model | Overall | Easy | Hard | Short | Medium | Long |
78
+ |:---|:---:|:---:|:---:|:---:|:---:|:---:|
79
+ | [Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) | 30.4 | 31.2 | 29.9 | 37.8 | 24.7 | 29.6 |
80
+ | [SeerAttention-Llama-3.1-8B](https://huggingface.co/SeerAttention/SeerAttention-Llama-3.1-8B-AttnGates) | 31.6 | 33.3 | 30.5 | 33.9 | 31.6 | 27.8 |
81
+ | [Qwen2.5-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct) | 34.8 | 37.5 | 33.1 | 44.4 | 32.1 | 24.1 |
82
+ | [SeerAttention-Qwen2.5-14B](https://huggingface.co/SeerAttention/SeerAttention-Qwen2.5-14B-AttnGates) | 32.8 | 38.0 | 29.6 | 45.0 | 30.2 | 17.6 |
83
+ | [Qwen2.5-32B-Instruct]((https://huggingface.co/Qwen/Qwen2.5-32B-Instruct)) | 36.4 | 42.2 | 32.8 | 47.8 | 29.8 | 30.6 |
84
+ | [SeerAttention-Qwen2.5-32B](https://huggingface.co/SeerAttention/SeerAttention-Qwen2.5-32B-AttnGates) | 36.4 | 41.1 | 33.4 | 49.4 | 29.8 | 27.8 |
85
+ | [DeepSeek-R1-Distill-Qwen-14B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B) | 34.2 | 43.2 | 28.6 | 45.0 | 27.9 | 28.7 |
86
+ | [SeerAttention-DeepSeek-R1-Distill-Qwen-14B](https://huggingface.co/SeerAttention/SeerAttention-DeepSeek-R1-Distill-Qwen-14B-AttnGates) | 31.6 | 35.9 | 28.9 | 41.7 | 26.0 | 25.9 |
87
+ | [DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B) | 37.2 | 42.7 | 33.8 | 47.2 | 35.8 | 23.1 |
88
+ | [SeerAttention-DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/SeerAttention/SeerAttention-DeepSeek-R1-Distill-Qwen-32B-AttnGates) | 37.0 | 42.2 | 33.8 | 49.4 | 31.6 | 26.9 |