fedebotu commited on
Commit
eb11a8f
·
1 Parent(s): 83dd916

[Docs] backends, better examples

Browse files
Files changed (2) hide show
  1. README.md +31 -18
  2. rn-logo-animated.svg +0 -173
README.md CHANGED
@@ -9,15 +9,13 @@ pipeline_tag: text-generation
9
  <center>
10
  <div style="text-align: center;">
11
  <img
12
- src="https://huggingface.co/radicalnumerics/RND1-Base-0910/raw/main/rn-logo-animated.svg"
13
  alt="Radical Numerics"
14
  style="width: 100%; max-width: 66%; height: auto; display: inline-block; margin-bottom: 0.5em; margin-top: 0.5em;"
15
  />
16
  </div>
17
  </center>
18
 
19
- <br>
20
-
21
  # RND1-Base-0910
22
 
23
  RND1 is an experimental diffusion language model with 30B parameters and 3B active parameters per token (sparse Mixture-of-Experts). This model was converted from a pretrained autoregressive base to enable diffusion-based text generation.
@@ -49,33 +47,44 @@ For faster inference with optimized MoE kernels:
49
  ```bash
50
  pip install flashinfer-python
51
  pip install sglang[all]
 
52
  ```
53
 
54
- **Warning:** selecting a non-Huggingface backend is highly encouraged. When using `flashinfer-python`, JIT compilation may take a while.
 
 
55
 
56
  ## Quick Start
57
 
58
  ```python
59
- from transformers import AutoModelForMaskedLM, AutoTokenizer
60
- import torch
 
 
61
 
62
  # Load model
63
  model = AutoModelForMaskedLM.from_pretrained(
64
  "radicalnumerics/RND1-Base-0910",
 
 
65
  trust_remote_code=True,
66
- torch_dtype=torch.bfloat16,
67
- device_map="auto"
68
  )
69
- tokenizer = AutoTokenizer.from_pretrained("radicalnumerics/RND1-Base-0910")
70
 
71
  # Generate - Task mode (for instructions and questions)
72
- prompt = "Write a Python function that finds the longest common subsequence."
73
- inputs = tokenizer(f"Question: {prompt}", return_tensors="pt").to(model.device)
 
 
 
74
  output = model.generate(
75
- inputs=inputs.input_ids,
76
  max_new_tokens=256,
77
  num_diffusion_steps=256,
 
78
  )
 
 
79
  text = tokenizer.decode(output[0], skip_special_tokens=True)
80
  print(text)
81
  ```
@@ -104,20 +113,24 @@ output = model.generate(
104
  inputs=inputs.input_ids,
105
  max_new_tokens=256,
106
  num_diffusion_steps=256,
 
107
  )
108
  ```
109
 
110
  ## Command-Line Interface
111
 
 
 
 
112
  ```bash
113
- # Task mode
114
- python demo_rnd_generation.py --prompt "Explain neural networks in simple terms"
115
 
116
- # Completion mode
117
- python demo_rnd_generation.py --mode completion --prompt "The future of AI"
118
 
119
- # With sampling parameters
120
- python demo_rnd_generation.py --top_k 50 --temperature 0.7 --prompt "Your prompt"
121
  ```
122
 
123
  ## Technical Details
 
9
  <center>
10
  <div style="text-align: center;">
11
  <img
12
+ src="https://raw.githubusercontent.com/RadicalNumerics/assets/refs/heads/main/svg/rn-logo-desktop-vector-animated.svg"
13
  alt="Radical Numerics"
14
  style="width: 100%; max-width: 66%; height: auto; display: inline-block; margin-bottom: 0.5em; margin-top: 0.5em;"
15
  />
16
  </div>
17
  </center>
18
 
 
 
19
  # RND1-Base-0910
20
 
21
  RND1 is an experimental diffusion language model with 30B parameters and 3B active parameters per token (sparse Mixture-of-Experts). This model was converted from a pretrained autoregressive base to enable diffusion-based text generation.
 
47
  ```bash
48
  pip install flashinfer-python
49
  pip install sglang[all]
50
+ pip install vllm
51
  ```
52
 
53
+ > [!WARNING]
54
+ > Selecting a non-Huggingface MoE backend is highly encouraged for faster generation. Note however that non-HF backends currently support a single GPU only, so you need to set e.g. `export CUDA_VISIBLE_DEVICES=0` before running the script. If you use `flashinfer-python`, JIT compilation the first time the code is run may take a while unless `flashinfer-jit-cache` is installed.
55
+
56
 
57
  ## Quick Start
58
 
59
  ```python
60
+ from transformers import AutoTokenizer, AutoModelForMaskedLM
61
+
62
+ # Load tokenizer
63
+ tokenizer = AutoTokenizer.from_pretrained("radicalnumerics/RND1-Base-0910", trust_remote_code=True)
64
 
65
  # Load model
66
  model = AutoModelForMaskedLM.from_pretrained(
67
  "radicalnumerics/RND1-Base-0910",
68
+ dtype="bfloat16",
69
+ device_map="auto",
70
  trust_remote_code=True,
71
+ moe_backend="vllm", # hf, sglang, vllm, flashinfer
 
72
  )
 
73
 
74
  # Generate - Task mode (for instructions and questions)
75
+ prompt = "Write a Python function that finds the longest common subsequence of two strings. Include comments explaining the algorithm."
76
+ inputs = tokenizer(f"Question: {prompt}\nAnswer:", return_tensors="pt")
77
+ input_ids = inputs.input_ids.to(model.device)
78
+
79
+ # Generate
80
  output = model.generate(
81
+ inputs=input_ids,
82
  max_new_tokens=256,
83
  num_diffusion_steps=256,
84
+ temperature=0.01,
85
  )
86
+
87
+ # Decode only the generated part
88
  text = tokenizer.decode(output[0], skip_special_tokens=True)
89
  print(text)
90
  ```
 
113
  inputs=inputs.input_ids,
114
  max_new_tokens=256,
115
  num_diffusion_steps=256,
116
+ temperature=0.01,
117
  )
118
  ```
119
 
120
  ## Command-Line Interface
121
 
122
+ Following the Github repo's demo script [demo_rnd_generation.py](https://github.com/RadicalNumerics/RND1/blob/main/demo_rnd_generation.py):
123
+
124
+
125
  ```bash
126
+ # Task mode (default) - for instructions, questions, or requests
127
+ python demo_rnd_generation.py --prompt "Write a Python function that finds the longest common subsequence of two strings. Include comments explaining the algorithm." --moe_backend hf
128
 
129
+ # Completion mode - for text continuation
130
+ python demo_rnd_generation.py --mode completion --prompt "The key to understanding quantum computing lies in" --moe_backend hf
131
 
132
+ # Sampling parameters
133
+ python demo_rnd_generation.py --top_k 50 --temperature 0.7 --prompt "Explain how neural networks learn in simple terms" --moe_backend hf
134
  ```
135
 
136
  ## Technical Details
rn-logo-animated.svg DELETED