JetLM
/

SDAR-4B-Chat

Text Generation

Model card Files Files and versions

ybian-umd commited on Aug 18

Commit

6488771

·

verified ·

1 Parent(s): e024d6e

Update README.md

Files changed (1) hide show

README.md +51 -1

README.md CHANGED Viewed

@@ -32,7 +32,57 @@ evaluation settings:
 **Note**: The 4B, 8B, and 30B models are coming soon. Performance results for these models will be released in the near future.
 ## Inference
-The inference code will come soon
 ## Hightlights
 - **Performance**: SDAR-1.7B-Chat achieves state-of-the-art.

 **Note**: The 4B, 8B, and 30B models are coming soon. Performance results for these models will be released in the near future.
 ## Inference
+### Using the tailored inference engine [JetEngine](https://github.com/Labman42/JetEngine)
+JetEngine enables more efficient inference compared to the built-in implementation.
+```bash
+git clone https://github.com/Labman42/JetEngine.git
+cd JetEngine
+pip install .
+```
+The following example shows how to quickly load a model with JetEngine and run a prompt end-to-end.
+```python
+import os
+from jetengine import LLM, SamplingParams
+from transformers import AutoTokenizer
+model_path = os.path.expanduser("/path/to/your/sdar-model")
+tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
+# Initialize the LLM
+llm = LLM(
+    model_path,
+    enforce_eager=True,
+    tensor_parallel_size=1,
+    mask_token_id=151669,   # Optional: only needed for masked/diffusion models
+    block_length=4
+)
+# Set sampling/generation parameters
+sampling_params = SamplingParams(
+    temperature=1.0,
+    topk=0,
+    topp=1.0,
+    max_tokens=256,
+    remasking_strategy="low_confidence_dynamic",
+    block_length=4,
+    denoising_steps=4,
+    dynamic_threshold=0.9
+)
+# Prepare a simple chat-style prompt
+prompt = tokenizer.apply_chat_template(
+    [{"role": "user", "content": "Explain what reinforcement learning is in simple terms."}],
+    tokenize=False,
+    add_generation_prompt=True
+)
+# Generate text
+outputs = llm.generate_streaming([prompt], sampling_params)
+```
 ## Hightlights
 - **Performance**: SDAR-1.7B-Chat achieves state-of-the-art.