danielhanchen commited on
Commit
1688f5f
·
verified ·
1 Parent(s): 4def9cd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +408 -14
README.md CHANGED
@@ -15,10 +15,7 @@ language:
15
  license: apache-2.0
16
  inference: false
17
  base_model:
18
- - mistralai/Ministral-3-3B-Base-2512
19
- extra_gated_description: >-
20
- If you want to learn more about how we process your personal data, please read
21
- our <a href="https://mistral.ai/terms/">Privacy Policy</a>.
22
  tags:
23
  - mistral-common
24
  ---
@@ -26,11 +23,7 @@ tags:
26
  # Ministral 3 3B Instruct 2512
27
  The smallest model in the Ministral 3 family, **Ministral 3 3B** is a powerful, efficient tiny language model with vision capabilities.
28
 
29
- This model is the instruct post-trained version, fine-tuned for instruction tasks, making it ideal for chat and instruction based use cases.
30
-
31
- The Ministral 3 family is designed for edge deployment, capable of running on a wide range of hardware. Ministral 3 3B can even be deployed locally, capable of fitting in 16GB of VRAM in BF16, and less than 8GB of RAM/VRAM when quantized.
32
-
33
- We provide a no-loss FP8 version [here](https://huggingface.co/mistralai/Ministral-3-3B-Instruct-2512-FP8), you can find other formats and quantizations in the [Ministral 3 - Quants](https://huggingface.co/collections/mistralai/ministral-3-quants) collection.
34
 
35
  ## Key Features
36
  Ministral 3 3B consists of two main architectural components:
@@ -63,16 +56,16 @@ Bringing advanced AI capabilities to edge and distributed environments for embed
63
  | Model Name | Type | Precision | Link |
64
  |--------------------------------|--------------------|-----------|------------------------------------------------------------------------------------------|
65
  | Ministral 3 3B Base 2512 | Base pre-trained | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-3B-Base-2512) |
66
- | **Ministral 3 3B Instruct 2512** | **Instruct post-trained** | **BF16** | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-3B-Instruct-2512) |
67
  | Ministral 3 3B Reasoning 2512 | Reasoning capable | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-3B-Reasoning-2512) |
68
  | Ministral 3 8B Base 2512 | Base pre-trained | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-8B-Base-2512) |
69
- | Ministral 3 8B Instruct 2512 | Instruct post-trained | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-8B-Instruct-2512) |
70
  | Ministral 3 8B Reasoning 2512 | Reasoning capable | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-8B-Reasoning-2512) |
71
- | Ministral 3 14B Base 2512 | Base pre-trained | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-14B-Base-2512) |
72
- | Ministral 3 14B Instruct 2512 | Instruct post-trained | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-14B-Instruct-2512) |
73
  | Ministral 3 14B Reasoning 2512 | Reasoning capable | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-14B-Reasoning-2512) |
74
 
75
- Other formats available [here](https://huggingface.co/collections/mistralai/ministral-3-quants).
76
 
77
  ## Benchmark Results
78
 
@@ -122,6 +115,407 @@ We compare Ministral 3 to similar sized models.
122
  | Qwen 3 4B Base | <u>0.677</u> | 0.405 | <u>0.570</u> | <u>0.759</u> | <u>0.713</u>| 0.530 |
123
  | Gemma 3 4B Base | 0.516 | 0.294 | 0.430 | 0.626 | 0.589 | <u>0.640</u> |
124
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
125
  ## License
126
 
127
  This model is licensed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0.txt).
 
15
  license: apache-2.0
16
  inference: false
17
  base_model:
18
+ - mistralai/Ministral-3-3B-Instruct-2512
 
 
 
19
  tags:
20
  - mistral-common
21
  ---
 
23
  # Ministral 3 3B Instruct 2512
24
  The smallest model in the Ministral 3 family, **Ministral 3 3B** is a powerful, efficient tiny language model with vision capabilities.
25
 
26
+ The Ministral 3 family is designed for edge deployment, capable of running on a wide range of hardware. Ministral 3 3B can even be deployed locally, capable of fitting in 8GB of VRAM in FP8, and less if further quantized.
 
 
 
 
27
 
28
  ## Key Features
29
  Ministral 3 3B consists of two main architectural components:
 
56
  | Model Name | Type | Precision | Link |
57
  |--------------------------------|--------------------|-----------|------------------------------------------------------------------------------------------|
58
  | Ministral 3 3B Base 2512 | Base pre-trained | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-3B-Base-2512) |
59
+ | Ministral 3 3B Instruct 2512 | Instruct post-trained | FP8 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-3B-Instruct-2512) |
60
  | Ministral 3 3B Reasoning 2512 | Reasoning capable | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-3B-Reasoning-2512) |
61
  | Ministral 3 8B Base 2512 | Base pre-trained | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-8B-Base-2512) |
62
+ | Ministral 3 8B Instruct 2512 | Instruct post-trained | FP8 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-8B-Instruct-2512) |
63
  | Ministral 3 8B Reasoning 2512 | Reasoning capable | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-8B-Reasoning-2512) |
64
+ | Ministral 3 14B Base 2512 | Base pre-trained** | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-14B-Base-2512) |
65
+ | Ministral 3 14B Instruct 2512 | Instruct post-trained | FP8 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-14B-Instruct-2512) |
66
  | Ministral 3 14B Reasoning 2512 | Reasoning capable | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-14B-Reasoning-2512) |
67
 
68
+ Other formats available [here](https://huggingface.co/collections/mistralai/ministral-3-more).
69
 
70
  ## Benchmark Results
71
 
 
115
  | Qwen 3 4B Base | <u>0.677</u> | 0.405 | <u>0.570</u> | <u>0.759</u> | <u>0.713</u>| 0.530 |
116
  | Gemma 3 4B Base | 0.516 | 0.294 | 0.430 | 0.626 | 0.589 | <u>0.640</u> |
117
 
118
+ ## Usage
119
+
120
+ The model can be used with the following frameworks;
121
+ - [`vllm`](https://github.com/vllm-project/vllm): See [here](#vllm)
122
+ - [`transformers`](https://github.com/huggingface/transformers): See [here](#transformers)
123
+
124
+ ### vLLM
125
+
126
+ We recommend using this model with [vLLM](https://github.com/vllm-project/vllm).
127
+
128
+ #### Installation
129
+
130
+ Make sure to install [`vLLM >= 0.12.0`](https://github.com/vllm-project/vllm/releases/tag/v0.12.0):
131
+
132
+ ```
133
+ pip install vllm --upgrade
134
+ ```
135
+
136
+ Doing so should automatically install [`mistral_common >= 1.8.6`](https://github.com/mistralai/mistral-common/releases/tag/v1.8.6).
137
+
138
+ To check:
139
+ ```
140
+ python -c "import mistral_common; print(mistral_common.__version__)"
141
+ ```
142
+
143
+ You can also make use of a ready-to-go [docker image](https://github.com/vllm-project/vllm/blob/main/Dockerfile) or on the [docker hub](https://hub.docker.com/layers/vllm/vllm-openai/latest/images/sha256-de9032a92ffea7b5c007dad80b38fd44aac11eddc31c435f8e52f3b7404bbf39).
144
+
145
+ #### Serve
146
+
147
+ Due to their size and the FP8 format of their weights `Ministral-3-3B-Instruct-2512`, `Ministral-3-8B-Instruct-2512` and `Ministral-3-14B-Instruct-2512` can run on a single 1xH200 GPU.
148
+
149
+ A simple launch command is:
150
+
151
+ ```bash
152
+ vllm serve mistralai/Ministral-3-3B-Instruct-2512 \
153
+ --enable-auto-tool-choice --tool-call-parser mistral
154
+ ```
155
+
156
+ Key parameter notes:
157
+
158
+ * enable-auto-tool-choice: Required when enabling tool usage.
159
+ * tool-call-parser mistral: Required when enabling tool usage.
160
+
161
+
162
+ Additional flags:
163
+
164
+ * You can set `--max-model-len` to preserve memory. By default it is set to `262144` which is quite large but not necessary for most scenarios.
165
+ * You can set `--max-num-batched-tokens` to balance throughput and latency, higher means higher throughput but higher latency.
166
+
167
+ #### Usage of the model
168
+
169
+ Here we asumme that the model `mistralai/Ministral-3-3B-Instruct-2512` is served and you can ping it to the domain `localhost` with the port `8000` which is the default for vLLM.
170
+
171
+ <details>
172
+ <summary>Vision Reasoning</summary>
173
+
174
+ Let's see if the Ministral 3 knows when to pick a fight !
175
+
176
+ ```python
177
+ from datetime import datetime, timedelta
178
+
179
+ from openai import OpenAI
180
+ from huggingface_hub import hf_hub_download
181
+
182
+ # Modify OpenAI's API key and API base to use vLLM's API server.
183
+ openai_api_key = "EMPTY"
184
+ openai_api_base = "http://localhost:8000/v1"
185
+
186
+ TEMP = 0.15
187
+ MAX_TOK = 262144
188
+
189
+ client = OpenAI(
190
+ api_key=openai_api_key,
191
+ base_url=openai_api_base,
192
+ )
193
+
194
+ models = client.models.list()
195
+ model = models.data[0].id
196
+
197
+
198
+ def load_system_prompt(repo_id: str, filename: str) -> str:
199
+ file_path = hf_hub_download(repo_id=repo_id, filename=filename)
200
+ with open(file_path, "r") as file:
201
+ system_prompt = file.read()
202
+ today = datetime.today().strftime("%Y-%m-%d")
203
+ yesterday = (datetime.today() - timedelta(days=1)).strftime("%Y-%m-%d")
204
+ model_name = repo_id.split("/")[-1]
205
+ return system_prompt.format(name=model_name, today=today, yesterday=yesterday)
206
+
207
+
208
+ SYSTEM_PROMPT = load_system_prompt(model, "SYSTEM_PROMPT.txt")
209
+ image_url = "https://static.wikia.nocookie.net/essentialsdocs/images/7/70/Battle.png/revision/latest?cb=20220523172438"
210
+
211
+ messages = [
212
+ {"role": "system", "content": SYSTEM_PROMPT},
213
+ {
214
+ "role": "user",
215
+ "content": [
216
+ {
217
+ "type": "text",
218
+ "text": "What action do you think I should take in this situation? List all the possible actions and explain why you think they are good or bad.",
219
+ },
220
+ {"type": "image_url", "image_url": {"url": image_url}},
221
+ ],
222
+ },
223
+ ]
224
+
225
+ print(messages)
226
+
227
+
228
+ response = client.chat.completions.create(
229
+ model=model,
230
+ messages=messages,
231
+ temperature=TEMP,
232
+ max_tokens=MAX_TOK,
233
+ )
234
+
235
+ print(response.choices[0].message.content)
236
+ ```
237
+
238
+ </details>
239
+
240
+ <details>
241
+ <summary>Function Calling</summary>
242
+
243
+ Let's solve some equations thanks to our simple Python calculator tool.
244
+
245
+ ```python
246
+ import json
247
+ from openai import OpenAI
248
+ from huggingface_hub import hf_hub_download
249
+
250
+ # Modify OpenAI's API key and API base to use vLLM's API server.
251
+ openai_api_key = "EMPTY"
252
+ openai_api_base = "http://localhost:8000/v1"
253
+
254
+ TEMP = 0.15
255
+ MAX_TOK = 262144
256
+
257
+ client = OpenAI(
258
+ api_key=openai_api_key,
259
+ base_url=openai_api_base,
260
+ )
261
+
262
+ models = client.models.list()
263
+ model = models.data[0].id
264
+
265
+
266
+ def load_system_prompt(repo_id: str, filename: str) -> str:
267
+ file_path = hf_hub_download(repo_id=repo_id, filename=filename)
268
+ with open(file_path, "r") as file:
269
+ system_prompt = file.read()
270
+ return system_prompt
271
+
272
+
273
+ SYSTEM_PROMPT = load_system_prompt(model, "SYSTEM_PROMPT.txt")
274
+
275
+ image_url = "https://math-coaching.com/img/fiche/46/expressions-mathematiques.jpg"
276
+
277
+
278
+ def my_calculator(expression: str) -> str:
279
+ return str(eval(expression))
280
+
281
+
282
+ tools = [
283
+ {
284
+ "type": "function",
285
+ "function": {
286
+ "name": "my_calculator",
287
+ "description": "A calculator that can evaluate a mathematical expression.",
288
+ "parameters": {
289
+ "type": "object",
290
+ "properties": {
291
+ "expression": {
292
+ "type": "string",
293
+ "description": "The mathematical expression to evaluate.",
294
+ },
295
+ },
296
+ "required": ["expression"],
297
+ },
298
+ },
299
+ },
300
+ {
301
+ "type": "function",
302
+ "function": {
303
+ "name": "rewrite",
304
+ "description": "Rewrite a given text for improved clarity",
305
+ "parameters": {
306
+ "type": "object",
307
+ "properties": {
308
+ "text": {
309
+ "type": "string",
310
+ "description": "The input text to rewrite",
311
+ }
312
+ },
313
+ },
314
+ },
315
+ },
316
+ ]
317
+
318
+ messages = [
319
+ {"role": "system", "content": SYSTEM_PROMPT},
320
+ {
321
+ "role": "user",
322
+ "content": [
323
+ {
324
+ "type": "text",
325
+ "text": "Thanks to your calculator, compute the results for the equations that involve numbers displayed in the image.",
326
+ },
327
+ {
328
+ "type": "image_url",
329
+ "image_url": {
330
+ "url": image_url,
331
+ },
332
+ },
333
+ ],
334
+ },
335
+ ]
336
+
337
+ response = client.chat.completions.create(
338
+ model=model,
339
+ messages=messages,
340
+ temperature=TEMP,
341
+ max_tokens=MAX_TOK,
342
+ tools=tools,
343
+ tool_choice="auto",
344
+ )
345
+
346
+ tool_calls = response.choices[0].message.tool_calls
347
+
348
+ results = []
349
+ for tool_call in tool_calls:
350
+ function_name = tool_call.function.name
351
+ function_args = tool_call.function.arguments
352
+ if function_name == "my_calculator":
353
+ result = my_calculator(**json.loads(function_args))
354
+ results.append(result)
355
+
356
+ messages.append({"role": "assistant", "tool_calls": tool_calls})
357
+ for tool_call, result in zip(tool_calls, results):
358
+ messages.append(
359
+ {
360
+ "role": "tool",
361
+ "tool_call_id": tool_call.id,
362
+ "name": tool_call.function.name,
363
+ "content": result,
364
+ }
365
+ )
366
+
367
+
368
+ response = client.chat.completions.create(
369
+ model=model,
370
+ messages=messages,
371
+ temperature=TEMP,
372
+ max_tokens=MAX_TOK,
373
+ )
374
+
375
+ print(response.choices[0].message.content)
376
+ ```
377
+
378
+ </details>
379
+
380
+ <details>
381
+ <summary>Text-Only Request</summary>
382
+
383
+ Ministral 3 can follow your instructions to the letter.
384
+
385
+ ```python
386
+ from openai import OpenAI
387
+ from huggingface_hub import hf_hub_download
388
+
389
+ # Modify OpenAI's API key and API base to use vLLM's API server.
390
+ openai_api_key = "EMPTY"
391
+ openai_api_base = "http://localhost:8000/v1"
392
+
393
+ TEMP = 0.15
394
+ MAX_TOK = 262144
395
+
396
+ client = OpenAI(
397
+ api_key=openai_api_key,
398
+ base_url=openai_api_base,
399
+ )
400
+
401
+ models = client.models.list()
402
+ model = models.data[0].id
403
+
404
+
405
+ def load_system_prompt(repo_id: str, filename: str) -> str:
406
+ file_path = hf_hub_download(repo_id=repo_id, filename=filename)
407
+ with open(file_path, "r") as file:
408
+ system_prompt = file.read()
409
+ return system_prompt
410
+
411
+
412
+ SYSTEM_PROMPT = load_system_prompt(model, "SYSTEM_PROMPT.txt")
413
+
414
+ messages = [
415
+ {"role": "system", "content": SYSTEM_PROMPT},
416
+ {
417
+ "role": "user",
418
+ "content": "Write me a sentence where every word starts with the next letter in the alphabet - start with 'a' and end with 'z'.",
419
+ },
420
+ ]
421
+
422
+ response = client.chat.completions.create(
423
+ model=model,
424
+ messages=messages,
425
+ temperature=TEMP,
426
+ max_tokens=MAX_TOK,
427
+ )
428
+
429
+ assistant_message = response.choices[0].message.content
430
+ print(assistant_message)
431
+ ```
432
+
433
+ </details>
434
+
435
+ ### Transformers
436
+
437
+ You can also use Ministral 3 3B Instruct 2512 with `Transformers` !
438
+
439
+ Transformers very recently added prelimenary support for FP8, so please make sure to install from main:
440
+
441
+ ```sh
442
+ uv pip install git+https://github.com/huggingface/transformers
443
+ ```
444
+
445
+ To make the best use of our model with `Transformers` make sure to have [installed](https://github.com/mistralai/mistral-common) `mistral-common >= 1.8.6` to use our tokenizer.
446
+
447
+ ```bash
448
+ pip install mistral-common --upgrade
449
+ ```
450
+
451
+ Try it out by running the following snippet.
452
+
453
+ > [!Tip]
454
+ > By default Transformers will load the checkpoint in FP8 and dequantize it to BF16 on the fly,
455
+ > which means the model currently does not make use of accelerated FP8-kernels.
456
+ > Compatibility with accelerated FP8-kernels is currently worked on and will be available in a couple of weeks.
457
+ > Stay tuned!
458
+
459
+ <details>
460
+ <summary>Python snippet</summary>
461
+
462
+ ```python
463
+ import torch
464
+ from transformers import Mistral3ForConditionalGeneration, MistralCommonBackend
465
+
466
+ model_id = "mistralai/Ministral-3-3B-Instruct-2512"
467
+
468
+ tokenizer = MistralCommonBackend.from_pretrained(model_id)
469
+ model = Mistral3ForConditionalGeneration.from_pretrained(model_id, device_map="auto")
470
+
471
+ image_url = "https://static.wikia.nocookie.net/essentialsdocs/images/7/70/Battle.png/revision/latest?cb=20220523172438"
472
+
473
+ messages = [
474
+ {
475
+ "role": "user",
476
+ "content": [
477
+ {
478
+ "type": "text",
479
+ "text": "What action do you think I should take in this situation? List all the possible actions and explain why you think they are good or bad.",
480
+ },
481
+ {"type": "image_url", "image_url": {"url": image_url}},
482
+ ],
483
+ },
484
+ ]
485
+
486
+ tokenized = tokenizer.apply_chat_template(messages, return_tensors="pt", return_dict=True)
487
+
488
+ tokenized["input_ids"] = tokenized["input_ids"].to(device="cuda")
489
+ tokenized["pixel_values"] = tokenized["pixel_values"].to(dtype=torch.bfloat16, device="cuda")
490
+ image_sizes = [tokenized["pixel_values"].shape[-2:]]
491
+
492
+ output = model.generate(
493
+ **tokenized,
494
+ image_sizes=image_sizes,
495
+ max_new_tokens=512,
496
+ )[0]
497
+
498
+ decoded_output = tokenizer.decode(output[len(tokenized["input_ids"][0]):])
499
+ print(decoded_output)
500
+ ```
501
+
502
+ **Note:**
503
+
504
+ Transformers allows you to automatically convert the checkpoint to Bfloat16. To so simple load the model as follows:
505
+
506
+ ```py
507
+ from transformers import Mistral3ForConditionalGeneration, FineGrainedFP8Config
508
+
509
+ model_id = "mistralai/Ministral-3-3B-Instruct-2512"
510
+ model = Mistral3ForConditionalGeneration.from_pretrained(
511
+ model_id,
512
+ device_map="auto",
513
+ quantization_config=FineGrainedFP8Config(dequantize=True)
514
+ )
515
+ ```
516
+
517
+ </details>
518
+
519
  ## License
520
 
521
  This model is licensed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0.txt).