MaxJeblick
/

llama2-0b-unit-test

Text Generation

text-generation-inference

Model card Files Files and versions

MaxJeblick commited on Aug 26, 2024

Commit

7581874

·

verified ·

1 Parent(s): ba8a171

Update README.md

Files changed (1) hide show

README.md +40 -13

README.md CHANGED Viewed

@@ -27,17 +27,44 @@ tokenizer.push_to_hub(repo_name, private=False)
 config.push_to_hub(repo_name, private=False)
 ```
-Use the following configuration in [H2O LLM Studio](https://github.com/h2oai/h2o-llmstudio) to run a complete experiment in **5 seconds** using the default dataset and default settings otherwise:
-```yaml
-Validation Size: 0.1
-Data Sample: 0.1
-Max Length Prompt: 32
-Max Length Answer: 32
-Max Length: 64
-Backbone Dtype: float16
-Gradient Checkpointing: False
-Batch Size: 8
-Max Length Inference: 16
-```

 config.push_to_hub(repo_name, private=False)
 ```
+Below is a small example that will run in ~ 1 second.
+```python
+import torch
+from transformers import AutoModelForCausalLM
+def test_manual_greedy_generate():
+    max_new_tokens = 10
+    # note this is on CPU!
+    model = AutoModelForCausalLM.from_pretrained("MaxJeblick/llama2-0b-unit-test").eval()
+    input_ids = model.dummy_inputs["input_ids"]
+    y = model.generate(input_ids, max_new_tokens=max_new_tokens)
+    assert y.shape == (3, input_ids.shape[1] + max_new_tokens)
+    for _ in range(max_new_tokens):
+        with torch.no_grad():
+            outputs = model(input_ids)
+        next_token_logits = outputs.logits[:, -1, :]
+        next_token_id = torch.argmax(next_token_logits, dim=-1).unsqueeze(-1)
+        input_ids = torch.cat([input_ids, next_token_id], dim=-1)
+    assert torch.allclose(y, input_ids)
+```
+Tipp:
+Use fixtures with session scope to load the model only once. This will decrease test runtime further.
+```python
+import pytest
+from transformers import AutoModelForCausalLM
+@pytest.fixture(scope="session")
+def model():
+    return AutoModelForCausalLM.from_pretrained("MaxJeblick/llama2-0b-unit-test").eval()
+```