ubergarm commited on
Commit
b12ad53
·
1 Parent(s): 2531618

initial commit

Browse files
Files changed (2) hide show
  1. .gitattributes +3 -0
  2. README.md +54 -3
.gitattributes CHANGED
@@ -33,3 +33,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ imatrix-*.dat filter=lfs diff=lfs merge=lfs -text
37
+ *.gguf filter=lfs diff=lfs merge=lfs -text
38
+ *.png filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,54 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ quantized_by: ubergarm
3
+ pipeline_tag: text-generation
4
+ base_model: zai-org/GLM-4.5
5
+ license: mit
6
+ base_model_relation: quantized
7
+ tags:
8
+ - imatrix
9
+ - conversational
10
+ - ik_llama.cpp
11
+ ---
12
+
13
+ This is an experimental place-holder with an imatrix not for general purpose use just yet. I'm not releasing any quants for this just yet until the various PRs are in place and tested better.
14
+
15
+ Check the References below for the BF16 GGUF I used and a github discussion showing methodology used to collect this imatrix specificlly for @Thireus.
16
+
17
+ Keep an eye out for new PR and follow along, once this thing is tested and considered working correctly I hope to release some quants for both this and the smaller Air model.
18
+
19
+ ## `ik_llama.cpp` imatrix Quantizations of zai-org/GLM-4.5
20
+ This quant collection **REQUIRES** [ik_llama.cpp](https://github.com/ikawrakow/ik_llama.cpp/) fork to support the ik's latest SOTA quants and optimizations! Do **not** download these big files and expect them to run on mainline vanilla llama.cpp, ollama, LM Studio, KoboldCpp, etc!
21
+
22
+ *NOTE* `ik_llama.cpp` can also run your existing GGUFs from bartowski, unsloth, mradermacher, etc if you want to try it out before downloading my quants.
23
+
24
+ Some of ik's new quants are supported with [Nexesenex/croco.cpp](https://github.com/Nexesenex/croco.cpp) fork of KoboldCPP.
25
+
26
+ These quants provide best in class perplexity for the given memory footprint.
27
+
28
+ ## Big Thanks
29
+ Shout out to Wendell and the **Level1Techs** crew, the community [Forums](https://forum.level1techs.com/t/deepseek-deep-dive-r1-at-home/225826), [YouTube Channel](https://www.youtube.com/@Level1Techs)! **BIG thanks** for providing **BIG hardware** expertise and access to run these experiments and make these great quants available to the community!!!
30
+
31
+ Also thanks to all the folks in the quanting and inferencing community on [BeaverAI Club Discord](https://huggingface.co/BeaverAI) and on [r/LocalLLaMA](https://www.reddit.com/r/LocalLLaMA/) for tips and tricks helping each other run, test, and benchmark all the fun new models!
32
+
33
+ ## Quant Collection
34
+ Perplexity computed against *wiki.test.raw*.
35
+
36
+ ![Perplexity Chart](images/perplexity.png "Chart showing Perplexity improving as BPW increases.")
37
+
38
+ These first two are just test quants for baseline perplexity comparison:
39
+ * `BF16` 670.586 GiB (16.075 BPW)
40
+ - Final estimate: PPL = TODO
41
+ * `Q8_0` 359.183 GiB (8.610 BPW)
42
+ - Final estimate: PPL = TODO
43
+
44
+ TODO
45
+
46
+ ## Quick Start
47
+ #### CPU-Only
48
+
49
+ ## References
50
+ * [ik_llama.cpp](https://github.com/ikawrakow/ik_llama.cpp)
51
+ * [Getting Started Guide (already out of date lol)](https://github.com/ikawrakow/ik_llama.cpp/discussions/258)
52
+ * [ubergarm-imatrix-calibration-corpus-v02.txt](https://gist.github.com/ubergarm/edfeb3ff9c6ec8b49e88cdf627b0711a?permalink_comment_id=5682584#gistcomment-5682584)
53
+ * [Thireus/GLM-4.5-THIREUS-BF16-SPECIAL_SPLIT](https://huggingface.co/Thireus/GLM-4.5-THIREUS-BF16-SPECIAL_SPLIT/tree/main)
54
+ * [Experimental ik_llama.cpp PR discussion](https://github.com/ikawrakow/ik_llama.cpp/pull/662#issuecomment-3145001132)