danielhanchen commited on
Commit
1c8f2ba
·
verified ·
1 Parent(s): 5bbf056

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +410 -14
README.md CHANGED
@@ -15,10 +15,7 @@ language:
15
  license: apache-2.0
16
  inference: false
17
  base_model:
18
- - mistralai/Ministral-3-14B-Base-2512
19
- extra_gated_description: >-
20
- If you want to learn more about how we process your personal data, please read
21
- our <a href="https://mistral.ai/terms/">Privacy Policy</a>.
22
  tags:
23
  - mistral-common
24
  ---
@@ -26,11 +23,7 @@ tags:
26
  # Ministral 3 14B Instruct 2512
27
  The largest model in the Ministral 3 family, **Ministral 3 14B** offers frontier capabilities and performance comparable to its larger [Mistral Small 3.2 24B](https://huggingface.co/mistralai/Mistral-Small-3.2-Instruct-2506) counterpart. A powerful and efficient language model with vision capabilities.
28
 
29
- This model is the instruct post-trained version, fine-tuned for instruction tasks, making it ideal for chat and instruction based use cases.
30
-
31
- The Ministral 3 family is designed for edge deployment, capable of running on a wide range of hardware. Ministral 3 14B can even be deployed locally, capable of fitting in 32GB of VRAM in BF16, and less than 24GB of RAM/VRAM when quantized.
32
-
33
- We provide a no-loss FP8 version [here](https://huggingface.co/mistralai/Ministral-3-14B-Instruct-2512-FP8), you can find other formats and quantizations in the [Ministral 3 - Quants](https://huggingface.co/collections/mistralai/ministral-3-quants) collection.
34
 
35
  ## Key Features
36
  Ministral 3 14B consists of two main architectural components:
@@ -60,16 +53,16 @@ Bringing advanced AI capabilities to most environments.
60
  | Model Name | Type | Precision | Link |
61
  |--------------------------------|--------------------|-----------|------------------------------------------------------------------------------------------|
62
  | Ministral 3 3B Base 2512 | Base pre-trained | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-3B-Base-2512) |
63
- | Ministral 3 3B Instruct 2512 | Instruct post-trained | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-3B-Instruct-2512) |
64
  | Ministral 3 3B Reasoning 2512 | Reasoning capable | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-3B-Reasoning-2512) |
65
  | Ministral 3 8B Base 2512 | Base pre-trained | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-8B-Base-2512) |
66
- | Ministral 3 8B Instruct 2512 | Instruct post-trained | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-8B-Instruct-2512) |
67
  | Ministral 3 8B Reasoning 2512 | Reasoning capable | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-8B-Reasoning-2512) |
68
- | Ministral 3 14B Base 2512 | Base pre-trained | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-14B-Base-2512) |
69
- | **Ministral 3 14B Instruct 2512** | **Instruct post-trained** | **BF16** | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-14B-Instruct-2512) |
70
  | Ministral 3 14B Reasoning 2512 | Reasoning capable | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-14B-Reasoning-2512) |
71
 
72
- Other formats available [here](https://huggingface.co/collections/mistralai/ministral-3-quants).
73
 
74
  ## Benchmark Results
75
 
@@ -119,6 +112,409 @@ We compare Ministral 3 to similar sized models.
119
  | Qwen 3 4B Base | <u>0.677</u> | 0.405 | <u>0.570</u> | <u>0.759</u> | <u>0.713</u>| 0.530 |
120
  | Gemma 3 4B Base | 0.516 | 0.294 | 0.430 | 0.626 | 0.589 | <u>0.640</u> |
121
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
122
  ## License
123
 
124
  This model is licensed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0.txt).
 
15
  license: apache-2.0
16
  inference: false
17
  base_model:
18
+ - mistralai/Ministral-3-14B-Instruct-2512
 
 
 
19
  tags:
20
  - mistral-common
21
  ---
 
23
  # Ministral 3 14B Instruct 2512
24
  The largest model in the Ministral 3 family, **Ministral 3 14B** offers frontier capabilities and performance comparable to its larger [Mistral Small 3.2 24B](https://huggingface.co/mistralai/Mistral-Small-3.2-Instruct-2506) counterpart. A powerful and efficient language model with vision capabilities.
25
 
26
+ The Ministral 3 family is designed for edge deployment, capable of running on a wide range of hardware. Ministral 3 14B can even be deployed locally, capable of fitting in 24GB of VRAM in FP8, and less if further quantized.
 
 
 
 
27
 
28
  ## Key Features
29
  Ministral 3 14B consists of two main architectural components:
 
53
  | Model Name | Type | Precision | Link |
54
  |--------------------------------|--------------------|-----------|------------------------------------------------------------------------------------------|
55
  | Ministral 3 3B Base 2512 | Base pre-trained | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-3B-Base-2512) |
56
+ | Ministral 3 3B Instruct 2512 | Instruct post-trained | FP8 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-3B-Instruct-2512) |
57
  | Ministral 3 3B Reasoning 2512 | Reasoning capable | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-3B-Reasoning-2512) |
58
  | Ministral 3 8B Base 2512 | Base pre-trained | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-8B-Base-2512) |
59
+ | Ministral 3 8B Instruct 2512 | Instruct post-trained | FP8 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-8B-Instruct-2512) |
60
  | Ministral 3 8B Reasoning 2512 | Reasoning capable | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-8B-Reasoning-2512) |
61
+ | Ministral 3 14B Base 2512 | Base pre-trained** | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-14B-Base-2512) |
62
+ | Ministral 3 14B Instruct 2512 | Instruct post-trained | FP8 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-14B-Instruct-2512) |
63
  | Ministral 3 14B Reasoning 2512 | Reasoning capable | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-14B-Reasoning-2512) |
64
 
65
+ Other formats available [here](https://huggingface.co/collections/mistralai/ministral-3-more).
66
 
67
  ## Benchmark Results
68
 
 
112
  | Qwen 3 4B Base | <u>0.677</u> | 0.405 | <u>0.570</u> | <u>0.759</u> | <u>0.713</u>| 0.530 |
113
  | Gemma 3 4B Base | 0.516 | 0.294 | 0.430 | 0.626 | 0.589 | <u>0.640</u> |
114
 
115
+ ## Usage
116
+
117
+ The model can be used with the following frameworks;
118
+ - [`vllm`](https://github.com/vllm-project/vllm): See [here](#vllm)
119
+ - [`transformers`](https://github.com/huggingface/transformers): See [here](#transformers)
120
+
121
+ ### vLLM
122
+
123
+ We recommend using this model with [vLLM](https://github.com/vllm-project/vllm).
124
+
125
+ #### Installation
126
+
127
+ Make sure to install most recent vllm:
128
+
129
+ ```
130
+ uv pip install -U vllm \
131
+ --torch-backend=auto \
132
+ --extra-index-url https://wheels.vllm.ai/nightly
133
+ ```
134
+
135
+ Doing so should automatically install [`mistral_common >= 1.8.6`](https://github.com/mistralai/mistral-common/releases/tag/v1.8.6).
136
+
137
+ To check:
138
+ ```
139
+ python -c "import mistral_common; print(mistral_common.__version__)"
140
+ ```
141
+
142
+ You can also make use of a ready-to-go [docker image](https://github.com/vllm-project/vllm/blob/main/Dockerfile) or on the [docker hub](https://hub.docker.com/layers/vllm/vllm-openai/latest/images/sha256-de9032a92ffea7b5c007dad80b38fd44aac11eddc31c435f8e52f3b7404bbf39).
143
+
144
+ #### Serve
145
+
146
+ Due to their size and the FP8 format of their weights `Ministral-3-3B-Instruct-2512`, `Ministral-3-8B-Instruct-2512` and `Ministral-3-14B-Instruct-2512` can run on a single 1xH200 GPU.
147
+
148
+ A simple launch command is:
149
+
150
+ ```bash
151
+ vllm serve mistralai/Ministral-3-14B-Instruct-2512 \
152
+ --enable-auto-tool-choice --tool-call-parser mistral
153
+ ```
154
+
155
+ Key parameter notes:
156
+
157
+ * enable-auto-tool-choice: Required when enabling tool usage.
158
+ * tool-call-parser mistral: Required when enabling tool usage.
159
+
160
+
161
+ Additional flags:
162
+
163
+ * You can set `--max-model-len` to preserve memory. By default it is set to `262144` which is quite large but not necessary for most scenarios.
164
+ * You can set `--max-num-batched-tokens` to balance throughput and latency, higher means higher throughput but higher latency.
165
+
166
+ #### Usage of the model
167
+
168
+ Here we asumme that the model `mistralai/Ministral-3-14B-Instruct-2512` is served and you can ping it to the domain `localhost` with the port `8000` which is the default for vLLM.
169
+
170
+ <details>
171
+ <summary>Vision Reasoning</summary>
172
+
173
+ Let's see if the Ministral 3 knows when to pick a fight !
174
+
175
+ ```python
176
+ from datetime import datetime, timedelta
177
+
178
+ from openai import OpenAI
179
+ from huggingface_hub import hf_hub_download
180
+
181
+ # Modify OpenAI's API key and API base to use vLLM's API server.
182
+ openai_api_key = "EMPTY"
183
+ openai_api_base = "http://localhost:8000/v1"
184
+
185
+ TEMP = 0.15
186
+ MAX_TOK = 262144
187
+
188
+ client = OpenAI(
189
+ api_key=openai_api_key,
190
+ base_url=openai_api_base,
191
+ )
192
+
193
+ models = client.models.list()
194
+ model = models.data[0].id
195
+
196
+
197
+ def load_system_prompt(repo_id: str, filename: str) -> str:
198
+ file_path = hf_hub_download(repo_id=repo_id, filename=filename)
199
+ with open(file_path, "r") as file:
200
+ system_prompt = file.read()
201
+ today = datetime.today().strftime("%Y-%m-%d")
202
+ yesterday = (datetime.today() - timedelta(days=1)).strftime("%Y-%m-%d")
203
+ model_name = repo_id.split("/")[-1]
204
+ return system_prompt.format(name=model_name, today=today, yesterday=yesterday)
205
+
206
+
207
+ SYSTEM_PROMPT = load_system_prompt(model, "SYSTEM_PROMPT.txt")
208
+ image_url = "https://static.wikia.nocookie.net/essentialsdocs/images/7/70/Battle.png/revision/latest?cb=20220523172438"
209
+
210
+ messages = [
211
+ {"role": "system", "content": SYSTEM_PROMPT},
212
+ {
213
+ "role": "user",
214
+ "content": [
215
+ {
216
+ "type": "text",
217
+ "text": "What action do you think I should take in this situation? List all the possible actions and explain why you think they are good or bad.",
218
+ },
219
+ {"type": "image_url", "image_url": {"url": image_url}},
220
+ ],
221
+ },
222
+ ]
223
+
224
+ print(messages)
225
+
226
+
227
+ response = client.chat.completions.create(
228
+ model=model,
229
+ messages=messages,
230
+ temperature=TEMP,
231
+ max_tokens=MAX_TOK,
232
+ )
233
+
234
+ print(response.choices[0].message.content)
235
+ ```
236
+
237
+ </details>
238
+
239
+ <details>
240
+ <summary>Function Calling</summary>
241
+
242
+ Let's solve some equations thanks to our simple Python calculator tool.
243
+
244
+ ```python
245
+ import json
246
+ from openai import OpenAI
247
+ from huggingface_hub import hf_hub_download
248
+
249
+ # Modify OpenAI's API key and API base to use vLLM's API server.
250
+ openai_api_key = "EMPTY"
251
+ openai_api_base = "http://localhost:8000/v1"
252
+
253
+ TEMP = 0.15
254
+ MAX_TOK = 262144
255
+
256
+ client = OpenAI(
257
+ api_key=openai_api_key,
258
+ base_url=openai_api_base,
259
+ )
260
+
261
+ models = client.models.list()
262
+ model = models.data[0].id
263
+
264
+
265
+ def load_system_prompt(repo_id: str, filename: str) -> str:
266
+ file_path = hf_hub_download(repo_id=repo_id, filename=filename)
267
+ with open(file_path, "r") as file:
268
+ system_prompt = file.read()
269
+ return system_prompt
270
+
271
+
272
+ SYSTEM_PROMPT = load_system_prompt(model, "SYSTEM_PROMPT.txt")
273
+
274
+ image_url = "https://math-coaching.com/img/fiche/46/expressions-mathematiques.jpg"
275
+
276
+
277
+ def my_calculator(expression: str) -> str:
278
+ return str(eval(expression))
279
+
280
+
281
+ tools = [
282
+ {
283
+ "type": "function",
284
+ "function": {
285
+ "name": "my_calculator",
286
+ "description": "A calculator that can evaluate a mathematical expression.",
287
+ "parameters": {
288
+ "type": "object",
289
+ "properties": {
290
+ "expression": {
291
+ "type": "string",
292
+ "description": "The mathematical expression to evaluate.",
293
+ },
294
+ },
295
+ "required": ["expression"],
296
+ },
297
+ },
298
+ },
299
+ {
300
+ "type": "function",
301
+ "function": {
302
+ "name": "rewrite",
303
+ "description": "Rewrite a given text for improved clarity",
304
+ "parameters": {
305
+ "type": "object",
306
+ "properties": {
307
+ "text": {
308
+ "type": "string",
309
+ "description": "The input text to rewrite",
310
+ }
311
+ },
312
+ },
313
+ },
314
+ },
315
+ ]
316
+
317
+ messages = [
318
+ {"role": "system", "content": SYSTEM_PROMPT},
319
+ {
320
+ "role": "user",
321
+ "content": [
322
+ {
323
+ "type": "text",
324
+ "text": "Thanks to your calculator, compute the results for the equations that involve numbers displayed in the image.",
325
+ },
326
+ {
327
+ "type": "image_url",
328
+ "image_url": {
329
+ "url": image_url,
330
+ },
331
+ },
332
+ ],
333
+ },
334
+ ]
335
+
336
+ response = client.chat.completions.create(
337
+ model=model,
338
+ messages=messages,
339
+ temperature=TEMP,
340
+ max_tokens=MAX_TOK,
341
+ tools=tools,
342
+ tool_choice="auto",
343
+ )
344
+
345
+ tool_calls = response.choices[0].message.tool_calls
346
+
347
+ results = []
348
+ for tool_call in tool_calls:
349
+ function_name = tool_call.function.name
350
+ function_args = tool_call.function.arguments
351
+ if function_name == "my_calculator":
352
+ result = my_calculator(**json.loads(function_args))
353
+ results.append(result)
354
+
355
+ messages.append({"role": "assistant", "tool_calls": tool_calls})
356
+ for tool_call, result in zip(tool_calls, results):
357
+ messages.append(
358
+ {
359
+ "role": "tool",
360
+ "tool_call_id": tool_call.id,
361
+ "name": tool_call.function.name,
362
+ "content": result,
363
+ }
364
+ )
365
+
366
+
367
+ response = client.chat.completions.create(
368
+ model=model,
369
+ messages=messages,
370
+ temperature=TEMP,
371
+ max_tokens=MAX_TOK,
372
+ )
373
+
374
+ print(response.choices[0].message.content)
375
+ ```
376
+
377
+ </details>
378
+
379
+ <details>
380
+ <summary>Text-Only Request</summary>
381
+
382
+ Ministral 3 can follow your instructions to the letter.
383
+
384
+ ```python
385
+ from openai import OpenAI
386
+ from huggingface_hub import hf_hub_download
387
+
388
+ # Modify OpenAI's API key and API base to use vLLM's API server.
389
+ openai_api_key = "EMPTY"
390
+ openai_api_base = "http://localhost:8000/v1"
391
+
392
+ TEMP = 0.15
393
+ MAX_TOK = 262144
394
+
395
+ client = OpenAI(
396
+ api_key=openai_api_key,
397
+ base_url=openai_api_base,
398
+ )
399
+
400
+ models = client.models.list()
401
+ model = models.data[0].id
402
+
403
+
404
+ def load_system_prompt(repo_id: str, filename: str) -> str:
405
+ file_path = hf_hub_download(repo_id=repo_id, filename=filename)
406
+ with open(file_path, "r") as file:
407
+ system_prompt = file.read()
408
+ return system_prompt
409
+
410
+
411
+ SYSTEM_PROMPT = load_system_prompt(model, "SYSTEM_PROMPT.txt")
412
+
413
+ messages = [
414
+ {"role": "system", "content": SYSTEM_PROMPT},
415
+ {
416
+ "role": "user",
417
+ "content": "Write me a sentence where every word starts with the next letter in the alphabet - start with 'a' and end with 'z'.",
418
+ },
419
+ ]
420
+
421
+ response = client.chat.completions.create(
422
+ model=model,
423
+ messages=messages,
424
+ temperature=TEMP,
425
+ max_tokens=MAX_TOK,
426
+ )
427
+
428
+ assistant_message = response.choices[0].message.content
429
+ print(assistant_message)
430
+ ```
431
+
432
+ </details>
433
+
434
+ ### Transformers
435
+
436
+ You can also use Ministral 3 14B Instruct 2512 with `Transformers` !
437
+
438
+ Transformers very recently added prelimenary support for FP8, so please make sure to install from main:
439
+
440
+ ```sh
441
+ uv pip install git+https://github.com/huggingface/transformers
442
+ ```
443
+
444
+ To make the best use of our model with `Transformers` make sure to have [installed](https://github.com/mistralai/mistral-common) `mistral-common >= 1.8.6` to use our tokenizer.
445
+
446
+ ```bash
447
+ pip install mistral-common --upgrade
448
+ ```
449
+
450
+ Try it out by running the following snippet.
451
+
452
+ > [!Tip]
453
+ > By default Transformers will load the checkpoint in FP8 and dequantize it to BF16 on the fly,
454
+ > which means the model currently does not make use of accelerated FP8-kernels.
455
+ > Compatibility with accelerated FP8-kernels is currently worked on and will be available in a couple of weeks.
456
+ > Stay tuned!
457
+
458
+ <details>
459
+ <summary>Python snippet</summary>
460
+
461
+ ```python
462
+ import torch
463
+ from transformers import Mistral3ForConditionalGeneration, MistralCommonBackend
464
+
465
+ model_id = "mistralai/Ministral-3-14B-Instruct-2512"
466
+
467
+ tokenizer = MistralCommonBackend.from_pretrained(model_id)
468
+ model = Mistral3ForConditionalGeneration.from_pretrained(model_id, device_map="auto")
469
+
470
+ image_url = "https://static.wikia.nocookie.net/essentialsdocs/images/7/70/Battle.png/revision/latest?cb=20220523172438"
471
+
472
+ messages = [
473
+ {
474
+ "role": "user",
475
+ "content": [
476
+ {
477
+ "type": "text",
478
+ "text": "What action do you think I should take in this situation? List all the possible actions and explain why you think they are good or bad.",
479
+ },
480
+ {"type": "image_url", "image_url": {"url": image_url}},
481
+ ],
482
+ },
483
+ ]
484
+
485
+ tokenized = tokenizer.apply_chat_template(messages, return_tensors="pt", return_dict=True)
486
+
487
+ tokenized["input_ids"] = tokenized["input_ids"].to(device="cuda")
488
+ tokenized["pixel_values"] = tokenized["pixel_values"].to(dtype=torch.bfloat16, device="cuda")
489
+ image_sizes = [tokenized["pixel_values"].shape[-2:]]
490
+
491
+ output = model.generate(
492
+ **tokenized,
493
+ image_sizes=image_sizes,
494
+ max_new_tokens=512,
495
+ )[0]
496
+
497
+ decoded_output = tokenizer.decode(output[len(tokenized["input_ids"][0]):])
498
+ print(decoded_output)
499
+ ```
500
+
501
+ **Note:**
502
+
503
+ Transformers allows you to automatically convert the checkpoint to Bfloat16. To so simple load the model as follows:
504
+
505
+ ```py
506
+ from transformers import Mistral3ForConditionalGeneration, FineGrainedFP8Config
507
+
508
+ model_id = "mistralai/Ministral-3-14B-Instruct-2512"
509
+ model = Mistral3ForConditionalGeneration.from_pretrained(
510
+ model_id,
511
+ device_map="auto",
512
+ quantization_config=FineGrainedFP8Config(dequantize=True)
513
+ )
514
+ ```
515
+
516
+ </details>
517
+
518
  ## License
519
 
520
  This model is licensed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0.txt).