Update README.md
Browse files
README.md
CHANGED
|
@@ -103,6 +103,7 @@ output_ids = generated_ids[0][len(model_inputs.input_ids[0]) :]
|
|
| 103 |
print(tokenizer.decode(output_ids, skip_special_tokens=True))
|
| 104 |
|
| 105 |
```
|
|
|
|
| 106 |
|
| 107 |
<details>
|
| 108 |
<summary><span style="font-size:1.1em; font-weight:bold;">🧩 Quantization Process</span></summary>
|
|
|
|
| 103 |
print(tokenizer.decode(output_ids, skip_special_tokens=True))
|
| 104 |
|
| 105 |
```
|
| 106 |
+
> You can optionally compile the model’s forward pass using torch.compile, which can provide a significant speed boost (especially after the first run). Please consider that the first run will take longer because PyTorch compiles optimized kernels, but subsequent runs will be much faster.
|
| 107 |
|
| 108 |
<details>
|
| 109 |
<summary><span style="font-size:1.1em; font-weight:bold;">🧩 Quantization Process</span></summary>
|