omitakahiro commited on
Commit
f3564c8
·
verified ·
1 Parent(s): f952bd2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +33 -1
README.md CHANGED
@@ -5,4 +5,36 @@ base_model:
5
  - stockmark/Stockmark-2-100B-Instruct-beta
6
  ---
7
 
8
- This repo contains the AWQ-quantized 4-bit version of [Stockmark-2-100B-Instruct-beta](https://huggingface.co/stockmark/Stockmark-2-100B-Instruct-beta)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  - stockmark/Stockmark-2-100B-Instruct-beta
6
  ---
7
 
8
+ # Stockmark-2-100B-Instruct-beta-AWQ
9
+
10
+ This repo contains the AWQ-quantized 4-bit version of [Stockmark-2-100B-Instruct-beta](https://huggingface.co/stockmark/Stockmark-2-100B-Instruct-beta)
11
+
12
+ ## Example
13
+
14
+ **Please use the float16 data type when loading the model. The bfloat16 data type is not supported in this model.**
15
+
16
+ ```python
17
+ import torch
18
+ from transformers import AutoTokenizer, AutoModelForCausalLM
19
+
20
+ tokenizer = AutoTokenizer.from_pretrained("stockmark/Stockmark-2-100B-Instruct-beta-AWQ")
21
+ model = AutoModelForCausalLM.from_pretrained("stockmark/Stockmark-2-100B-Instruct-beta-AWQ", device_map="auto", torch_dtype=torch.float16)
22
+
23
+ instruction = "自然言語処理とは?"
24
+ input_ids = tokenizer.apply_chat_template(
25
+ [{"role": "user", "content": instruction}], add_generation_prompt=True, return_tensors="pt"
26
+ ).to(model.device)
27
+
28
+ with torch.inference_mode():
29
+ tokens = model.generate(
30
+ input_ids,
31
+ max_new_tokens = 512,
32
+ do_sample = True,
33
+ temperature = 0.7,
34
+ top_p = 0.95,
35
+ repetition_penalty = 1.05
36
+ )
37
+
38
+ output = tokenizer.decode(tokens[0], skip_special_tokens=True)
39
+ print(output)
40
+ ```