radicalnumerics
/

RND1-Base-0910

@@ -9,15 +9,13 @@ pipeline_tag: text-generation
 <center>
 <div style="text-align: center;">
   <img
-    src="https://huggingface.co/radicalnumerics/RND1-Base-0910/raw/main/rn-logo-animated.svg"
     alt="Radical Numerics"
     style="width: 100%; max-width: 66%; height: auto; display: inline-block; margin-bottom: 0.5em; margin-top: 0.5em;"
   />
 </div>
 </center>
-<br>
 # RND1-Base-0910
 RND1 is an experimental diffusion language model with 30B parameters and 3B active parameters per token (sparse Mixture-of-Experts). This model was converted from a pretrained autoregressive base to enable diffusion-based text generation.
@@ -49,33 +47,44 @@ For faster inference with optimized MoE kernels:
 ```bash
 pip install flashinfer-python
 pip install sglang[all]
 ```
-**Warning:** selecting a non-Huggingface backend is highly encouraged. When using `flashinfer-python`, JIT compilation may take a while.
 ## Quick Start
 ```python
-from transformers import AutoModelForMaskedLM, AutoTokenizer
-import torch
 # Load model
 model = AutoModelForMaskedLM.from_pretrained(
     "radicalnumerics/RND1-Base-0910",
     trust_remote_code=True,
-    torch_dtype=torch.bfloat16,
-    device_map="auto"
 )
-tokenizer = AutoTokenizer.from_pretrained("radicalnumerics/RND1-Base-0910")
 # Generate - Task mode (for instructions and questions)
-prompt = "Write a Python function that finds the longest common subsequence."
-inputs = tokenizer(f"Question: {prompt}", return_tensors="pt").to(model.device)
 output = model.generate(
-    inputs=inputs.input_ids,
     max_new_tokens=256,
     num_diffusion_steps=256,
 )
 text = tokenizer.decode(output[0], skip_special_tokens=True)
 print(text)
 ```
@@ -104,20 +113,24 @@ output = model.generate(
     inputs=inputs.input_ids,
     max_new_tokens=256,
     num_diffusion_steps=256,
 )
 ```
 ## Command-Line Interface
 ```bash
-# Task mode
-python demo_rnd_generation.py --prompt "Explain neural networks in simple terms"
-# Completion mode
-python demo_rnd_generation.py --mode completion --prompt "The future of AI"
-# With sampling parameters
-python demo_rnd_generation.py --top_k 50 --temperature 0.7 --prompt "Your prompt"
 ```
 ## Technical Details

 <center>
 <div style="text-align: center;">
   <img
+    src="https://raw.githubusercontent.com/RadicalNumerics/assets/refs/heads/main/svg/rn-logo-desktop-vector-animated.svg"
     alt="Radical Numerics"
     style="width: 100%; max-width: 66%; height: auto; display: inline-block; margin-bottom: 0.5em; margin-top: 0.5em;"
   />
 </div>
 </center>
 # RND1-Base-0910
 RND1 is an experimental diffusion language model with 30B parameters and 3B active parameters per token (sparse Mixture-of-Experts). This model was converted from a pretrained autoregressive base to enable diffusion-based text generation.
 ```bash
 pip install flashinfer-python
 pip install sglang[all]
+pip install vllm
 ```
+> [!WARNING]
+> Selecting a non-Huggingface MoE backend is highly encouraged for faster generation. Note however that non-HF backends currently support a single GPU only, so you need to set e.g. `export CUDA_VISIBLE_DEVICES=0` before running the script. If you use `flashinfer-python`, JIT compilation the first time the code is run may take a while unless `flashinfer-jit-cache` is installed.
 ## Quick Start
 ```python
+from transformers import AutoTokenizer, AutoModelForMaskedLM
+# Load tokenizer
+tokenizer = AutoTokenizer.from_pretrained("radicalnumerics/RND1-Base-0910", trust_remote_code=True)
 # Load model
 model = AutoModelForMaskedLM.from_pretrained(
     "radicalnumerics/RND1-Base-0910",
+    dtype="bfloat16",
+    device_map="auto",
     trust_remote_code=True,
+    moe_backend="vllm", # hf, sglang, vllm, flashinfer
 )
 # Generate - Task mode (for instructions and questions)
+prompt = "Write a Python function that finds the longest common subsequence of two strings. Include comments explaining the algorithm."
+inputs = tokenizer(f"Question: {prompt}\nAnswer:", return_tensors="pt")
+input_ids = inputs.input_ids.to(model.device)
+# Generate
 output = model.generate(
+    inputs=input_ids,
     max_new_tokens=256,
     num_diffusion_steps=256,
+    temperature=0.01,
 )
+# Decode only the generated part
 text = tokenizer.decode(output[0], skip_special_tokens=True)
 print(text)
 ```
     inputs=inputs.input_ids,
     max_new_tokens=256,
     num_diffusion_steps=256,
+    temperature=0.01,
 )
 ```
 ## Command-Line Interface
+Following the Github repo's demo script [demo_rnd_generation.py](https://github.com/RadicalNumerics/RND1/blob/main/demo_rnd_generation.py):
 ```bash
+# Task mode (default) - for instructions, questions, or requests
+python demo_rnd_generation.py --prompt "Write a Python function that finds the longest common subsequence of two strings. Include comments explaining the algorithm." --moe_backend hf
+# Completion mode - for text continuation
+python demo_rnd_generation.py --mode completion --prompt "The key to understanding quantum computing lies in" --moe_backend hf
+# Sampling parameters
+python demo_rnd_generation.py --top_k 50 --temperature 0.7 --prompt "Explain how neural networks learn in simple terms" --moe_backend hf
 ```
 ## Technical Details

rn-logo-animated.svg DELETED Viewed