danielhanchen commited on
Commit
736827c
·
verified ·
1 Parent(s): 4a83c31

Upload folder using huggingface_hub

Browse files
README.md CHANGED
@@ -15,46 +15,23 @@ language:
15
  license: apache-2.0
16
  inference: false
17
  base_model:
18
- - mistralai/Ministral-3-3B-Instruct-2512
 
 
 
19
  tags:
20
  - mistral-common
21
- - mistral
22
- - unsloth
23
- ---
24
- <div>
25
- <p style="margin-bottom: 0; margin-top: 0;">
26
- <strong>See our <a href="https://huggingface.co/collections/unsloth/ministral-3">Ministral 3 collection</a> for all versions including GGUF, 4-bit & FP8 formats.</strong>
27
- </p>
28
- <p style="margin-bottom: 0;">
29
- <em>Learn to run Ministral correctly - <a href="https://docs.unsloth.ai/new/ministral-3">Read our Guide</a>.</em>
30
- </p>
31
- <p style="margin-top: 0;margin-bottom: 0;">
32
- <em>See <a href="https://docs.unsloth.ai/basics/unsloth-dynamic-v2.0-gguf">Unsloth Dynamic 2.0 GGUFs</a> for our quantization benchmarks.</em>
33
- </p>
34
- <div style="display: flex; gap: 5px; align-items: center; ">
35
- <a href="https://github.com/unslothai/unsloth/">
36
- <img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="133">
37
- </a>
38
- <a href="https://discord.gg/unsloth">
39
- <img src="https://github.com/unslothai/unsloth/raw/main/images/Discord%20button.png" width="173">
40
- </a>
41
- <a href="https://docs.unsloth.ai/new/ministral-3">
42
- <img src="https://raw.githubusercontent.com/unslothai/unsloth/refs/heads/main/images/documentation%20green%20button.png" width="143">
43
- </a>
44
- </div>
45
- <h1 style="margin-top: 0rem;">✨ Read our Ministral 3 Guide <a href="https://docs.unsloth.ai/new/ministral-3">here</a>!</h1>
46
- </div>
47
-
48
- - Fine-tune Ministral 3 for free using our [Google Colab notebook](https://docs.unsloth.ai/new/ministral-3#fine-tuning)
49
- - Or train Ministral 3 with reinforcement learning (GSPO) with our [free notebook](https://docs.unsloth.ai/new/ministral-3#reinforcement-learning-grpo).
50
- - View the rest of our notebooks in our [docs here](https://docs.unsloth.ai/get-started/unsloth-notebooks).
51
  ---
52
 
53
  # Ministral 3 3B Instruct 2512
54
  The smallest model in the Ministral 3 family, **Ministral 3 3B** is a powerful, efficient tiny language model with vision capabilities.
55
 
 
 
56
  The Ministral 3 family is designed for edge deployment, capable of running on a wide range of hardware. Ministral 3 3B can even be deployed locally, capable of fitting in 8GB of VRAM in FP8, and less if further quantized.
57
 
 
 
58
  ## Key Features
59
  Ministral 3 3B consists of two main architectural components:
60
  - **3.4B Language Model**
@@ -81,12 +58,24 @@ Ideal for lightweight, real-time applications on edge or low-resource devices, s
81
 
82
  Bringing advanced AI capabilities to edge and distributed environments for embedded systems.
83
 
 
 
 
 
 
 
 
 
 
 
 
 
84
  ## Ministral 3 Family
85
 
86
  | Model Name | Type | Precision | Link |
87
  |--------------------------------|--------------------|-----------|------------------------------------------------------------------------------------------|
88
  | Ministral 3 3B Base 2512 | Base pre-trained | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-3B-Base-2512) |
89
- | Ministral 3 3B Instruct 2512 | Instruct post-trained | FP8 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-3B-Instruct-2512) |
90
  | Ministral 3 3B Reasoning 2512 | Reasoning capable | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-3B-Reasoning-2512) |
91
  | Ministral 3 8B Base 2512 | Base pre-trained | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-8B-Base-2512) |
92
  | Ministral 3 8B Instruct 2512 | Instruct post-trained | FP8 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-8B-Instruct-2512) |
@@ -95,7 +84,7 @@ Bringing advanced AI capabilities to edge and distributed environments for embed
95
  | Ministral 3 14B Instruct 2512 | Instruct post-trained | FP8 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-14B-Instruct-2512) |
96
  | Ministral 3 14B Reasoning 2512 | Reasoning capable | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-14B-Reasoning-2512) |
97
 
98
- Other formats available [here](https://huggingface.co/collections/mistralai/ministral-3-more).
99
 
100
  ## Benchmark Results
101
 
@@ -157,7 +146,7 @@ We recommend using this model with [vLLM](https://github.com/vllm-project/vllm).
157
 
158
  #### Installation
159
 
160
- Make sure to install [`vLLM >= 0.12.0`](https://github.com/vllm-project/vllm/releases/tag/v0.12.0):
161
 
162
  ```
163
  pip install vllm --upgrade
@@ -170,7 +159,7 @@ To check:
170
  python -c "import mistral_common; print(mistral_common.__version__)"
171
  ```
172
 
173
- You can also make use of a ready-to-go [docker image](https://github.com/vllm-project/vllm/blob/main/Dockerfile) or on the [docker hub](https://hub.docker.com/layers/vllm/vllm-openai/latest/images/sha256-de9032a92ffea7b5c007dad80b38fd44aac11eddc31c435f8e52f3b7404bbf39).
174
 
175
  #### Serve
176
 
@@ -180,6 +169,7 @@ A simple launch command is:
180
 
181
  ```bash
182
  vllm serve mistralai/Ministral-3-3B-Instruct-2512 \
 
183
  --enable-auto-tool-choice --tool-call-parser mistral
184
  ```
185
 
@@ -193,10 +183,10 @@ Additional flags:
193
 
194
  * You can set `--max-model-len` to preserve memory. By default it is set to `262144` which is quite large but not necessary for most scenarios.
195
  * You can set `--max-num-batched-tokens` to balance throughput and latency, higher means higher throughput but higher latency.
196
-
197
  #### Usage of the model
198
 
199
- Here we asumme that the model `mistralai/Ministral-3-3B-Instruct-2512` is served and you can ping it to the domain `localhost` with the port `8000` which is the default for vLLM.
200
 
201
  <details>
202
  <summary>Vision Reasoning</summary>
@@ -252,8 +242,6 @@ messages = [
252
  },
253
  ]
254
 
255
- print(messages)
256
-
257
 
258
  response = client.chat.completions.create(
259
  model=model,
@@ -466,7 +454,7 @@ print(assistant_message)
466
 
467
  You can also use Ministral 3 3B Instruct 2512 with `Transformers` !
468
 
469
- Transformers very recently added prelimenary support for FP8, so please make sure to install from main:
470
 
471
  ```sh
472
  uv pip install git+https://github.com/huggingface/transformers
@@ -481,10 +469,11 @@ pip install mistral-common --upgrade
481
  Try it out by running the following snippet.
482
 
483
  > [!Tip]
484
- > By default Transformers will load the checkpoint in FP8 and dequantize it to BF16 on the fly,
485
- > which means the model currently does not make use of accelerated FP8-kernels.
486
- > Compatibility with accelerated FP8-kernels is currently worked on and will be available in a couple of weeks.
487
- > Stay tuned!
 
488
 
489
  <details>
490
  <summary>Python snippet</summary>
@@ -529,9 +518,11 @@ decoded_output = tokenizer.decode(output[len(tokenized["input_ids"][0]):])
529
  print(decoded_output)
530
  ```
531
 
532
- **Note:**
533
 
534
- Transformers allows you to automatically convert the checkpoint to Bfloat16. To so simple load the model as follows:
 
 
535
 
536
  ```py
537
  from transformers import Mistral3ForConditionalGeneration, FineGrainedFP8Config
@@ -544,8 +535,6 @@ model = Mistral3ForConditionalGeneration.from_pretrained(
544
  )
545
  ```
546
 
547
- </details>
548
-
549
  ## License
550
 
551
  This model is licensed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0.txt).
 
15
  license: apache-2.0
16
  inference: false
17
  base_model:
18
+ - mistralai/Ministral-3-3B-Base-2512
19
+ extra_gated_description: >-
20
+ If you want to learn more about how we process your personal data, please read
21
+ our <a href="https://mistral.ai/terms/">Privacy Policy</a>.
22
  tags:
23
  - mistral-common
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24
  ---
25
 
26
  # Ministral 3 3B Instruct 2512
27
  The smallest model in the Ministral 3 family, **Ministral 3 3B** is a powerful, efficient tiny language model with vision capabilities.
28
 
29
+ This model is the instruct post-trained version in **FP8**, fine-tuned for instruction tasks, making it ideal for chat and instruction based use cases.
30
+
31
  The Ministral 3 family is designed for edge deployment, capable of running on a wide range of hardware. Ministral 3 3B can even be deployed locally, capable of fitting in 8GB of VRAM in FP8, and less if further quantized.
32
 
33
+ Learn more in our blog post [here](https://mistral.ai/news/mistral-3).
34
+
35
  ## Key Features
36
  Ministral 3 3B consists of two main architectural components:
37
  - **3.4B Language Model**
 
58
 
59
  Bringing advanced AI capabilities to edge and distributed environments for embedded systems.
60
 
61
+ ### Recommended Settings
62
+
63
+ We recommend deploying with the following best practices:
64
+ - System Prompt: Define a clear environment and use case, including guidance on how to effectively leverage tools in agentic systems.
65
+ - Sampling Parameters: Use a **temperature below 0.1** for daily-driver and production environments ; Higher temperatures may be explored for creative use cases - developers are encouraged to experiment with alternative settings.
66
+ - Tools: Keep the set of tools well-defined and limit their number to the minimum required for the use case - Avoiding overloading the model with an excessive number of tools.
67
+ - Vision: When deploying with vision capabilities, we recommend maintaining an aspect ratio close to 1:1 (width-to-height) for images. Avoiding the use of overly thin or wide images - crop them as needed to ensure optimal performance.
68
+
69
+ ### Recommended Sampling
70
+
71
+ * We recommend starting with a Temperature of 0.1 for most use cases. Feel free to experiment with different settings to best suit your specific needs.
72
+
73
  ## Ministral 3 Family
74
 
75
  | Model Name | Type | Precision | Link |
76
  |--------------------------------|--------------------|-----------|------------------------------------------------------------------------------------------|
77
  | Ministral 3 3B Base 2512 | Base pre-trained | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-3B-Base-2512) |
78
+ | **Ministral 3 3B Instruct 2512** | **Instruct post-trained** | **FP8** | [**Hugging Face**](https://huggingface.co/mistralai/Ministral-3-3B-Instruct-2512) |
79
  | Ministral 3 3B Reasoning 2512 | Reasoning capable | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-3B-Reasoning-2512) |
80
  | Ministral 3 8B Base 2512 | Base pre-trained | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-8B-Base-2512) |
81
  | Ministral 3 8B Instruct 2512 | Instruct post-trained | FP8 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-8B-Instruct-2512) |
 
84
  | Ministral 3 14B Instruct 2512 | Instruct post-trained | FP8 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-14B-Instruct-2512) |
85
  | Ministral 3 14B Reasoning 2512 | Reasoning capable | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-14B-Reasoning-2512) |
86
 
87
+ Other formats available [here](https://huggingface.co/collections/mistralai/ministral-3-additional-checkpoints).
88
 
89
  ## Benchmark Results
90
 
 
146
 
147
  #### Installation
148
 
149
+ Make sure to install **vllm >= 1.12.0**:
150
 
151
  ```
152
  pip install vllm --upgrade
 
159
  python -c "import mistral_common; print(mistral_common.__version__)"
160
  ```
161
 
162
+ You can also make use of a ready-to-go [docker image](https://github.com/vllm-project/vllm/blob/main/docker/Dockerfile) or on the [docker hub](https://hub.docker.com/layers/vllm/vllm-openai/latest).
163
 
164
  #### Serve
165
 
 
169
 
170
  ```bash
171
  vllm serve mistralai/Ministral-3-3B-Instruct-2512 \
172
+ --tokenizer_mode mistral --config_format mistral --load_format mistral \
173
  --enable-auto-tool-choice --tool-call-parser mistral
174
  ```
175
 
 
183
 
184
  * You can set `--max-model-len` to preserve memory. By default it is set to `262144` which is quite large but not necessary for most scenarios.
185
  * You can set `--max-num-batched-tokens` to balance throughput and latency, higher means higher throughput but higher latency.
186
+
187
  #### Usage of the model
188
 
189
+ Here we assume that the model `mistralai/Ministral-3-3B-Instruct-2512` is served and you can ping it to the domain `localhost` with the port `8000` which is the default for vLLM.
190
 
191
  <details>
192
  <summary>Vision Reasoning</summary>
 
242
  },
243
  ]
244
 
 
 
245
 
246
  response = client.chat.completions.create(
247
  model=model,
 
454
 
455
  You can also use Ministral 3 3B Instruct 2512 with `Transformers` !
456
 
457
+ Transformers recently added support for FP8, so make sure to install from main:
458
 
459
  ```sh
460
  uv pip install git+https://github.com/huggingface/transformers
 
469
  Try it out by running the following snippet.
470
 
471
  > [!Tip]
472
+ > On latest main as of 05/12/2025, by default
473
+ > a FP8 triton kernel for fast accelerated matmuls
474
+ > (`w8a8_block_fp8_matmul_triton`) will be used
475
+ > without any degradation in accuracy. However, if you want to
476
+ > run your model in BF16 see ([here](#transformers-bf16))
477
 
478
  <details>
479
  <summary>Python snippet</summary>
 
518
  print(decoded_output)
519
  ```
520
 
521
+ </details>
522
 
523
+ #### Transformers BF16
524
+
525
+ Transformers allows you to automatically convert the checkpoint to Bfloat16. To do so, simply load the model as follows:
526
 
527
  ```py
528
  from transformers import Mistral3ForConditionalGeneration, FineGrainedFP8Config
 
535
  )
536
  ```
537
 
 
 
538
  ## License
539
 
540
  This model is licensed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0.txt).
chat_template.jinja CHANGED
@@ -1,4 +1,3 @@
1
- {#- Unsloth template fixes #}
2
  {#- Default system message if no system prompt is passed. #}
3
  {%- set default_system_message = 'You are Ministral-3-3B-Instruct-2512, a Large Language Model (LLM) created by Mistral AI, a French startup headquartered in Paris.\nYou power an AI assistant called Le Chat.\nYour knowledge base was last updated on 2023-10-01.\nThe current date is {today}.\n\nWhen you\'re not sure about some information or when the user\'s request requires up-to-date or specific data, you must use the available tools to fetch the information. Do not hesitate to use tools whenever they can provide a more accurate or complete response. If no relevant tools are available, then clearly state that you don\'t have the information and avoid making up anything.\nIf the user\'s question is not clear, ambiguous, or does not provide enough context for you to accurately answer the question, you do not try to answer it right away and you rather ask the user to clarify their request (e.g. "What are some good restaurants around me?" => "Where are you?" or "When is the next flight to Tokyo" => "Where do you travel from?").\nYou are always very attentive to dates, in particular you try to resolve dates (e.g. "yesterday" is {yesterday}) and when asked about information at specific dates, you discard information that is at another date.\nYou follow these instructions in all languages, and always respond to the user in the language they use or request.\nNext sections describe the capabilities that you have.\n\n# WEB BROWSING INSTRUCTIONS\n\nYou cannot perform any web search or access internet to open URLs, links etc. If it seems like the user is expecting you to do so, you clarify the situation and ask the user to copy paste the text directly in the chat.\n\n# MULTI-MODAL INSTRUCTIONS\n\nYou have the ability to read images, but you cannot generate images. You also cannot transcribe audio files or videos.\nYou cannot read nor transcribe audio files or videos.\n\n# TOOL CALLING INSTRUCTIONS\n\nYou may have access to tools that you can use to fetch information or perform actions. You must use these tools in the following situations:\n\n1. When the request requires up-to-date information.\n2. When the request requires specific data that you do not have in your knowledge base.\n3. When the request involves actions that you cannot perform without tools.\n\nAlways prioritize using tools to provide the most accurate and helpful response. If tools are not available, inform the user that you cannot perform the requested action at the moment.' %}
4
 
@@ -80,10 +79,13 @@
80
 
81
  {#- Assistant messages supports text content or text and image chunks. #}
82
  {%- elif message['role'] == 'assistant' %}
 
 
 
83
 
84
  {%- if message['content'] is string %}
85
  {{- message['content'] }}
86
- {%- elif message['content'] is iterable and message['content'] | length > 0 %}
87
  {%- for block in message['content'] %}
88
  {%- if block['type'] == 'text' %}
89
  {{- block['text'] }}
@@ -114,8 +116,6 @@
114
 
115
  {#- Raise exception for unsupported roles. #}
116
  {%- else %}
117
- {{- raise_exception('Only user, assistant and tool roles are supported, got ' + message['role']) }}
118
  {%- endif %}
119
  {%- endfor %}
120
-
121
- {#- Copyright 2025-present Unsloth. Apache 2.0 License. #}
 
 
1
  {#- Default system message if no system prompt is passed. #}
2
  {%- set default_system_message = 'You are Ministral-3-3B-Instruct-2512, a Large Language Model (LLM) created by Mistral AI, a French startup headquartered in Paris.\nYou power an AI assistant called Le Chat.\nYour knowledge base was last updated on 2023-10-01.\nThe current date is {today}.\n\nWhen you\'re not sure about some information or when the user\'s request requires up-to-date or specific data, you must use the available tools to fetch the information. Do not hesitate to use tools whenever they can provide a more accurate or complete response. If no relevant tools are available, then clearly state that you don\'t have the information and avoid making up anything.\nIf the user\'s question is not clear, ambiguous, or does not provide enough context for you to accurately answer the question, you do not try to answer it right away and you rather ask the user to clarify their request (e.g. "What are some good restaurants around me?" => "Where are you?" or "When is the next flight to Tokyo" => "Where do you travel from?").\nYou are always very attentive to dates, in particular you try to resolve dates (e.g. "yesterday" is {yesterday}) and when asked about information at specific dates, you discard information that is at another date.\nYou follow these instructions in all languages, and always respond to the user in the language they use or request.\nNext sections describe the capabilities that you have.\n\n# WEB BROWSING INSTRUCTIONS\n\nYou cannot perform any web search or access internet to open URLs, links etc. If it seems like the user is expecting you to do so, you clarify the situation and ask the user to copy paste the text directly in the chat.\n\n# MULTI-MODAL INSTRUCTIONS\n\nYou have the ability to read images, but you cannot generate images. You also cannot transcribe audio files or videos.\nYou cannot read nor transcribe audio files or videos.\n\n# TOOL CALLING INSTRUCTIONS\n\nYou may have access to tools that you can use to fetch information or perform actions. You must use these tools in the following situations:\n\n1. When the request requires up-to-date information.\n2. When the request requires specific data that you do not have in your knowledge base.\n3. When the request involves actions that you cannot perform without tools.\n\nAlways prioritize using tools to provide the most accurate and helpful response. If tools are not available, inform the user that you cannot perform the requested action at the moment.' %}
3
 
 
79
 
80
  {#- Assistant messages supports text content or text and image chunks. #}
81
  {%- elif message['role'] == 'assistant' %}
82
+ {%- if (message['content'] is none or message['content'] == '' or message['content']|length == 0) and (message['tool_calls'] is not defined or message['tool_calls'] is none or message['tool_calls']|length == 0) %}
83
+ {{- raise_exception('Assistant message must have a string or a list of chunks in content or a list of tool calls.') }}
84
+ {%- endif %}
85
 
86
  {%- if message['content'] is string %}
87
  {{- message['content'] }}
88
+ {%- elif message['content'] | length > 0 %}
89
  {%- for block in message['content'] %}
90
  {%- if block['type'] == 'text' %}
91
  {{- block['text'] }}
 
116
 
117
  {#- Raise exception for unsupported roles. #}
118
  {%- else %}
119
+ {{- raise_exception('Only user, assistant and tool roles are supported, got ' + message['role'] + '.') }}
120
  {%- endif %}
121
  {%- endfor %}
 
 
config.json CHANGED
@@ -2,18 +2,28 @@
2
  "architectures": [
3
  "Mistral3ForConditionalGeneration"
4
  ],
5
- "bos_token_id": 1,
6
  "torch_dtype": "bfloat16",
7
- "eos_token_id": 2,
8
  "image_token_index": 10,
9
  "model_type": "mistral3",
10
  "multimodal_projector_bias": false,
11
- "pad_token_id": 11,
12
  "projector_hidden_act": "gelu",
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  "spatial_merge_size": 2,
14
  "text_config": {
15
  "attention_dropout": 0.0,
16
- "torch_dtype": "bfloat16",
17
  "head_dim": 128,
18
  "hidden_act": "silu",
19
  "hidden_size": 3072,
@@ -43,10 +53,8 @@
43
  "vocab_size": 131072
44
  },
45
  "transformers_version": "5.0.0.dev0",
46
- "unsloth_fixed": true,
47
  "vision_config": {
48
  "attention_dropout": 0.0,
49
- "torch_dtype": "bfloat16",
50
  "head_dim": 64,
51
  "hidden_act": "silu",
52
  "hidden_size": 1024,
 
2
  "architectures": [
3
  "Mistral3ForConditionalGeneration"
4
  ],
 
5
  "torch_dtype": "bfloat16",
 
6
  "image_token_index": 10,
7
  "model_type": "mistral3",
8
  "multimodal_projector_bias": false,
 
9
  "projector_hidden_act": "gelu",
10
+ "quantization_config": {
11
+ "activation_scheme": "static",
12
+ "dequantize": false,
13
+ "modules_to_not_convert": [
14
+ "model.vision_tower",
15
+ "model.multi_modal_projector",
16
+ "lm_head",
17
+ "model.vision_tower",
18
+ "model.multi_modal_projector",
19
+ "lm_head"
20
+ ],
21
+ "quant_method": "fp8",
22
+ "weight_block_size": null
23
+ },
24
  "spatial_merge_size": 2,
25
  "text_config": {
26
  "attention_dropout": 0.0,
 
27
  "head_dim": 128,
28
  "hidden_act": "silu",
29
  "hidden_size": 3072,
 
53
  "vocab_size": 131072
54
  },
55
  "transformers_version": "5.0.0.dev0",
 
56
  "vision_config": {
57
  "attention_dropout": 0.0,
 
58
  "head_dim": 64,
59
  "hidden_act": "silu",
60
  "hidden_size": 1024,
model-00002-of-00002.safetensors DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:718d087fa591fd4356b7241f293c24219399d86f092d46cf36f765051498033a
3
- size 2730659224
 
 
 
 
model-00001-of-00002.safetensors → model.safetensors RENAMED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:b3821ebc30884f66e3d26d339e161641f34b91ed916627011e7b08e5f1edd884
3
- size 4967581832
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:728f1826cd0e38191ca7b1379e81f78cf0555c6ffd95882aabd2404632346f86
3
+ size 4672099184
model.safetensors.index.json DELETED
@@ -1,466 +0,0 @@
1
- {
2
- "metadata": {
3
- "total_parameters": 4251743232,
4
- "total_size": 7698180096
5
- },
6
- "weight_map": {
7
- "language_model.model.embed_tokens.weight": "model-00001-of-00002.safetensors",
8
- "language_model.model.layers.0.input_layernorm.weight": "model-00001-of-00002.safetensors",
9
- "language_model.model.layers.0.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
10
- "language_model.model.layers.0.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
11
- "language_model.model.layers.0.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
12
- "language_model.model.layers.0.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
13
- "language_model.model.layers.0.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
14
- "language_model.model.layers.0.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
15
- "language_model.model.layers.0.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
16
- "language_model.model.layers.0.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
17
- "language_model.model.layers.1.input_layernorm.weight": "model-00001-of-00002.safetensors",
18
- "language_model.model.layers.1.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
19
- "language_model.model.layers.1.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
20
- "language_model.model.layers.1.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
21
- "language_model.model.layers.1.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
22
- "language_model.model.layers.1.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
23
- "language_model.model.layers.1.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
24
- "language_model.model.layers.1.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
25
- "language_model.model.layers.1.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
26
- "language_model.model.layers.10.input_layernorm.weight": "model-00001-of-00002.safetensors",
27
- "language_model.model.layers.10.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
28
- "language_model.model.layers.10.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
29
- "language_model.model.layers.10.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
30
- "language_model.model.layers.10.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
31
- "language_model.model.layers.10.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
32
- "language_model.model.layers.10.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
33
- "language_model.model.layers.10.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
34
- "language_model.model.layers.10.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
35
- "language_model.model.layers.11.input_layernorm.weight": "model-00001-of-00002.safetensors",
36
- "language_model.model.layers.11.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
37
- "language_model.model.layers.11.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
38
- "language_model.model.layers.11.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
39
- "language_model.model.layers.11.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
40
- "language_model.model.layers.11.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
41
- "language_model.model.layers.11.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
42
- "language_model.model.layers.11.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
43
- "language_model.model.layers.11.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
44
- "language_model.model.layers.12.input_layernorm.weight": "model-00001-of-00002.safetensors",
45
- "language_model.model.layers.12.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
46
- "language_model.model.layers.12.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
47
- "language_model.model.layers.12.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
48
- "language_model.model.layers.12.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
49
- "language_model.model.layers.12.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
50
- "language_model.model.layers.12.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
51
- "language_model.model.layers.12.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
52
- "language_model.model.layers.12.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
53
- "language_model.model.layers.13.input_layernorm.weight": "model-00001-of-00002.safetensors",
54
- "language_model.model.layers.13.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
55
- "language_model.model.layers.13.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
56
- "language_model.model.layers.13.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
57
- "language_model.model.layers.13.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
58
- "language_model.model.layers.13.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
59
- "language_model.model.layers.13.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
60
- "language_model.model.layers.13.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
61
- "language_model.model.layers.13.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
62
- "language_model.model.layers.14.input_layernorm.weight": "model-00002-of-00002.safetensors",
63
- "language_model.model.layers.14.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
64
- "language_model.model.layers.14.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
65
- "language_model.model.layers.14.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
66
- "language_model.model.layers.14.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
67
- "language_model.model.layers.14.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
68
- "language_model.model.layers.14.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
69
- "language_model.model.layers.14.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
70
- "language_model.model.layers.14.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
71
- "language_model.model.layers.15.input_layernorm.weight": "model-00002-of-00002.safetensors",
72
- "language_model.model.layers.15.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
73
- "language_model.model.layers.15.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
74
- "language_model.model.layers.15.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
75
- "language_model.model.layers.15.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
76
- "language_model.model.layers.15.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
77
- "language_model.model.layers.15.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
78
- "language_model.model.layers.15.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
79
- "language_model.model.layers.15.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
80
- "language_model.model.layers.16.input_layernorm.weight": "model-00002-of-00002.safetensors",
81
- "language_model.model.layers.16.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
82
- "language_model.model.layers.16.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
83
- "language_model.model.layers.16.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
84
- "language_model.model.layers.16.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
85
- "language_model.model.layers.16.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
86
- "language_model.model.layers.16.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
87
- "language_model.model.layers.16.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
88
- "language_model.model.layers.16.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
89
- "language_model.model.layers.17.input_layernorm.weight": "model-00002-of-00002.safetensors",
90
- "language_model.model.layers.17.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
91
- "language_model.model.layers.17.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
92
- "language_model.model.layers.17.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
93
- "language_model.model.layers.17.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
94
- "language_model.model.layers.17.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
95
- "language_model.model.layers.17.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
96
- "language_model.model.layers.17.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
97
- "language_model.model.layers.17.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
98
- "language_model.model.layers.18.input_layernorm.weight": "model-00002-of-00002.safetensors",
99
- "language_model.model.layers.18.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
100
- "language_model.model.layers.18.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
101
- "language_model.model.layers.18.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
102
- "language_model.model.layers.18.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
103
- "language_model.model.layers.18.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
104
- "language_model.model.layers.18.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
105
- "language_model.model.layers.18.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
106
- "language_model.model.layers.18.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
107
- "language_model.model.layers.19.input_layernorm.weight": "model-00002-of-00002.safetensors",
108
- "language_model.model.layers.19.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
109
- "language_model.model.layers.19.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
110
- "language_model.model.layers.19.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
111
- "language_model.model.layers.19.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
112
- "language_model.model.layers.19.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
113
- "language_model.model.layers.19.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
114
- "language_model.model.layers.19.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
115
- "language_model.model.layers.19.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
116
- "language_model.model.layers.2.input_layernorm.weight": "model-00001-of-00002.safetensors",
117
- "language_model.model.layers.2.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
118
- "language_model.model.layers.2.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
119
- "language_model.model.layers.2.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
120
- "language_model.model.layers.2.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
121
- "language_model.model.layers.2.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
122
- "language_model.model.layers.2.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
123
- "language_model.model.layers.2.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
124
- "language_model.model.layers.2.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
125
- "language_model.model.layers.20.input_layernorm.weight": "model-00002-of-00002.safetensors",
126
- "language_model.model.layers.20.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
127
- "language_model.model.layers.20.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
128
- "language_model.model.layers.20.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
129
- "language_model.model.layers.20.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
130
- "language_model.model.layers.20.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
131
- "language_model.model.layers.20.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
132
- "language_model.model.layers.20.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
133
- "language_model.model.layers.20.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
134
- "language_model.model.layers.21.input_layernorm.weight": "model-00002-of-00002.safetensors",
135
- "language_model.model.layers.21.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
136
- "language_model.model.layers.21.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
137
- "language_model.model.layers.21.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
138
- "language_model.model.layers.21.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
139
- "language_model.model.layers.21.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
140
- "language_model.model.layers.21.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
141
- "language_model.model.layers.21.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
142
- "language_model.model.layers.21.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
143
- "language_model.model.layers.22.input_layernorm.weight": "model-00002-of-00002.safetensors",
144
- "language_model.model.layers.22.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
145
- "language_model.model.layers.22.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
146
- "language_model.model.layers.22.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
147
- "language_model.model.layers.22.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
148
- "language_model.model.layers.22.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
149
- "language_model.model.layers.22.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
150
- "language_model.model.layers.22.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
151
- "language_model.model.layers.22.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
152
- "language_model.model.layers.23.input_layernorm.weight": "model-00002-of-00002.safetensors",
153
- "language_model.model.layers.23.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
154
- "language_model.model.layers.23.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
155
- "language_model.model.layers.23.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
156
- "language_model.model.layers.23.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
157
- "language_model.model.layers.23.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
158
- "language_model.model.layers.23.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
159
- "language_model.model.layers.23.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
160
- "language_model.model.layers.23.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
161
- "language_model.model.layers.24.input_layernorm.weight": "model-00002-of-00002.safetensors",
162
- "language_model.model.layers.24.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
163
- "language_model.model.layers.24.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
164
- "language_model.model.layers.24.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
165
- "language_model.model.layers.24.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
166
- "language_model.model.layers.24.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
167
- "language_model.model.layers.24.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
168
- "language_model.model.layers.24.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
169
- "language_model.model.layers.24.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
170
- "language_model.model.layers.25.input_layernorm.weight": "model-00002-of-00002.safetensors",
171
- "language_model.model.layers.25.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
172
- "language_model.model.layers.25.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
173
- "language_model.model.layers.25.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
174
- "language_model.model.layers.25.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
175
- "language_model.model.layers.25.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
176
- "language_model.model.layers.25.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
177
- "language_model.model.layers.25.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
178
- "language_model.model.layers.25.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
179
- "language_model.model.layers.3.input_layernorm.weight": "model-00001-of-00002.safetensors",
180
- "language_model.model.layers.3.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
181
- "language_model.model.layers.3.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
182
- "language_model.model.layers.3.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
183
- "language_model.model.layers.3.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
184
- "language_model.model.layers.3.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
185
- "language_model.model.layers.3.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
186
- "language_model.model.layers.3.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
187
- "language_model.model.layers.3.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
188
- "language_model.model.layers.4.input_layernorm.weight": "model-00001-of-00002.safetensors",
189
- "language_model.model.layers.4.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
190
- "language_model.model.layers.4.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
191
- "language_model.model.layers.4.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
192
- "language_model.model.layers.4.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
193
- "language_model.model.layers.4.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
194
- "language_model.model.layers.4.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
195
- "language_model.model.layers.4.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
196
- "language_model.model.layers.4.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
197
- "language_model.model.layers.5.input_layernorm.weight": "model-00001-of-00002.safetensors",
198
- "language_model.model.layers.5.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
199
- "language_model.model.layers.5.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
200
- "language_model.model.layers.5.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
201
- "language_model.model.layers.5.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
202
- "language_model.model.layers.5.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
203
- "language_model.model.layers.5.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
204
- "language_model.model.layers.5.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
205
- "language_model.model.layers.5.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
206
- "language_model.model.layers.6.input_layernorm.weight": "model-00001-of-00002.safetensors",
207
- "language_model.model.layers.6.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
208
- "language_model.model.layers.6.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
209
- "language_model.model.layers.6.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
210
- "language_model.model.layers.6.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
211
- "language_model.model.layers.6.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
212
- "language_model.model.layers.6.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
213
- "language_model.model.layers.6.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
214
- "language_model.model.layers.6.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
215
- "language_model.model.layers.7.input_layernorm.weight": "model-00001-of-00002.safetensors",
216
- "language_model.model.layers.7.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
217
- "language_model.model.layers.7.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
218
- "language_model.model.layers.7.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
219
- "language_model.model.layers.7.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
220
- "language_model.model.layers.7.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
221
- "language_model.model.layers.7.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
222
- "language_model.model.layers.7.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
223
- "language_model.model.layers.7.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
224
- "language_model.model.layers.8.input_layernorm.weight": "model-00001-of-00002.safetensors",
225
- "language_model.model.layers.8.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
226
- "language_model.model.layers.8.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
227
- "language_model.model.layers.8.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
228
- "language_model.model.layers.8.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
229
- "language_model.model.layers.8.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
230
- "language_model.model.layers.8.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
231
- "language_model.model.layers.8.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
232
- "language_model.model.layers.8.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
233
- "language_model.model.layers.9.input_layernorm.weight": "model-00001-of-00002.safetensors",
234
- "language_model.model.layers.9.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
235
- "language_model.model.layers.9.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
236
- "language_model.model.layers.9.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
237
- "language_model.model.layers.9.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
238
- "language_model.model.layers.9.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
239
- "language_model.model.layers.9.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
240
- "language_model.model.layers.9.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
241
- "language_model.model.layers.9.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
242
- "language_model.model.norm.weight": "model-00002-of-00002.safetensors",
243
- "multi_modal_projector.linear_1.weight": "model-00001-of-00002.safetensors",
244
- "multi_modal_projector.linear_2.weight": "model-00001-of-00002.safetensors",
245
- "multi_modal_projector.norm.weight": "model-00001-of-00002.safetensors",
246
- "multi_modal_projector.patch_merger.merging_layer.weight": "model-00001-of-00002.safetensors",
247
- "vision_tower.ln_pre.weight": "model-00001-of-00002.safetensors",
248
- "vision_tower.patch_conv.weight": "model-00001-of-00002.safetensors",
249
- "vision_tower.transformer.layers.0.attention.k_proj.weight": "model-00001-of-00002.safetensors",
250
- "vision_tower.transformer.layers.0.attention.o_proj.weight": "model-00001-of-00002.safetensors",
251
- "vision_tower.transformer.layers.0.attention.q_proj.weight": "model-00001-of-00002.safetensors",
252
- "vision_tower.transformer.layers.0.attention.v_proj.weight": "model-00001-of-00002.safetensors",
253
- "vision_tower.transformer.layers.0.attention_norm.weight": "model-00001-of-00002.safetensors",
254
- "vision_tower.transformer.layers.0.feed_forward.down_proj.weight": "model-00001-of-00002.safetensors",
255
- "vision_tower.transformer.layers.0.feed_forward.gate_proj.weight": "model-00001-of-00002.safetensors",
256
- "vision_tower.transformer.layers.0.feed_forward.up_proj.weight": "model-00001-of-00002.safetensors",
257
- "vision_tower.transformer.layers.0.ffn_norm.weight": "model-00001-of-00002.safetensors",
258
- "vision_tower.transformer.layers.1.attention.k_proj.weight": "model-00001-of-00002.safetensors",
259
- "vision_tower.transformer.layers.1.attention.o_proj.weight": "model-00001-of-00002.safetensors",
260
- "vision_tower.transformer.layers.1.attention.q_proj.weight": "model-00001-of-00002.safetensors",
261
- "vision_tower.transformer.layers.1.attention.v_proj.weight": "model-00001-of-00002.safetensors",
262
- "vision_tower.transformer.layers.1.attention_norm.weight": "model-00001-of-00002.safetensors",
263
- "vision_tower.transformer.layers.1.feed_forward.down_proj.weight": "model-00001-of-00002.safetensors",
264
- "vision_tower.transformer.layers.1.feed_forward.gate_proj.weight": "model-00001-of-00002.safetensors",
265
- "vision_tower.transformer.layers.1.feed_forward.up_proj.weight": "model-00001-of-00002.safetensors",
266
- "vision_tower.transformer.layers.1.ffn_norm.weight": "model-00001-of-00002.safetensors",
267
- "vision_tower.transformer.layers.10.attention.k_proj.weight": "model-00001-of-00002.safetensors",
268
- "vision_tower.transformer.layers.10.attention.o_proj.weight": "model-00001-of-00002.safetensors",
269
- "vision_tower.transformer.layers.10.attention.q_proj.weight": "model-00001-of-00002.safetensors",
270
- "vision_tower.transformer.layers.10.attention.v_proj.weight": "model-00001-of-00002.safetensors",
271
- "vision_tower.transformer.layers.10.attention_norm.weight": "model-00001-of-00002.safetensors",
272
- "vision_tower.transformer.layers.10.feed_forward.down_proj.weight": "model-00001-of-00002.safetensors",
273
- "vision_tower.transformer.layers.10.feed_forward.gate_proj.weight": "model-00001-of-00002.safetensors",
274
- "vision_tower.transformer.layers.10.feed_forward.up_proj.weight": "model-00001-of-00002.safetensors",
275
- "vision_tower.transformer.layers.10.ffn_norm.weight": "model-00001-of-00002.safetensors",
276
- "vision_tower.transformer.layers.11.attention.k_proj.weight": "model-00001-of-00002.safetensors",
277
- "vision_tower.transformer.layers.11.attention.o_proj.weight": "model-00001-of-00002.safetensors",
278
- "vision_tower.transformer.layers.11.attention.q_proj.weight": "model-00001-of-00002.safetensors",
279
- "vision_tower.transformer.layers.11.attention.v_proj.weight": "model-00001-of-00002.safetensors",
280
- "vision_tower.transformer.layers.11.attention_norm.weight": "model-00001-of-00002.safetensors",
281
- "vision_tower.transformer.layers.11.feed_forward.down_proj.weight": "model-00001-of-00002.safetensors",
282
- "vision_tower.transformer.layers.11.feed_forward.gate_proj.weight": "model-00001-of-00002.safetensors",
283
- "vision_tower.transformer.layers.11.feed_forward.up_proj.weight": "model-00001-of-00002.safetensors",
284
- "vision_tower.transformer.layers.11.ffn_norm.weight": "model-00001-of-00002.safetensors",
285
- "vision_tower.transformer.layers.12.attention.k_proj.weight": "model-00001-of-00002.safetensors",
286
- "vision_tower.transformer.layers.12.attention.o_proj.weight": "model-00001-of-00002.safetensors",
287
- "vision_tower.transformer.layers.12.attention.q_proj.weight": "model-00001-of-00002.safetensors",
288
- "vision_tower.transformer.layers.12.attention.v_proj.weight": "model-00001-of-00002.safetensors",
289
- "vision_tower.transformer.layers.12.attention_norm.weight": "model-00001-of-00002.safetensors",
290
- "vision_tower.transformer.layers.12.feed_forward.down_proj.weight": "model-00001-of-00002.safetensors",
291
- "vision_tower.transformer.layers.12.feed_forward.gate_proj.weight": "model-00001-of-00002.safetensors",
292
- "vision_tower.transformer.layers.12.feed_forward.up_proj.weight": "model-00001-of-00002.safetensors",
293
- "vision_tower.transformer.layers.12.ffn_norm.weight": "model-00001-of-00002.safetensors",
294
- "vision_tower.transformer.layers.13.attention.k_proj.weight": "model-00001-of-00002.safetensors",
295
- "vision_tower.transformer.layers.13.attention.o_proj.weight": "model-00001-of-00002.safetensors",
296
- "vision_tower.transformer.layers.13.attention.q_proj.weight": "model-00001-of-00002.safetensors",
297
- "vision_tower.transformer.layers.13.attention.v_proj.weight": "model-00001-of-00002.safetensors",
298
- "vision_tower.transformer.layers.13.attention_norm.weight": "model-00001-of-00002.safetensors",
299
- "vision_tower.transformer.layers.13.feed_forward.down_proj.weight": "model-00001-of-00002.safetensors",
300
- "vision_tower.transformer.layers.13.feed_forward.gate_proj.weight": "model-00001-of-00002.safetensors",
301
- "vision_tower.transformer.layers.13.feed_forward.up_proj.weight": "model-00001-of-00002.safetensors",
302
- "vision_tower.transformer.layers.13.ffn_norm.weight": "model-00001-of-00002.safetensors",
303
- "vision_tower.transformer.layers.14.attention.k_proj.weight": "model-00001-of-00002.safetensors",
304
- "vision_tower.transformer.layers.14.attention.o_proj.weight": "model-00001-of-00002.safetensors",
305
- "vision_tower.transformer.layers.14.attention.q_proj.weight": "model-00001-of-00002.safetensors",
306
- "vision_tower.transformer.layers.14.attention.v_proj.weight": "model-00001-of-00002.safetensors",
307
- "vision_tower.transformer.layers.14.attention_norm.weight": "model-00001-of-00002.safetensors",
308
- "vision_tower.transformer.layers.14.feed_forward.down_proj.weight": "model-00001-of-00002.safetensors",
309
- "vision_tower.transformer.layers.14.feed_forward.gate_proj.weight": "model-00001-of-00002.safetensors",
310
- "vision_tower.transformer.layers.14.feed_forward.up_proj.weight": "model-00001-of-00002.safetensors",
311
- "vision_tower.transformer.layers.14.ffn_norm.weight": "model-00001-of-00002.safetensors",
312
- "vision_tower.transformer.layers.15.attention.k_proj.weight": "model-00001-of-00002.safetensors",
313
- "vision_tower.transformer.layers.15.attention.o_proj.weight": "model-00001-of-00002.safetensors",
314
- "vision_tower.transformer.layers.15.attention.q_proj.weight": "model-00001-of-00002.safetensors",
315
- "vision_tower.transformer.layers.15.attention.v_proj.weight": "model-00001-of-00002.safetensors",
316
- "vision_tower.transformer.layers.15.attention_norm.weight": "model-00001-of-00002.safetensors",
317
- "vision_tower.transformer.layers.15.feed_forward.down_proj.weight": "model-00001-of-00002.safetensors",
318
- "vision_tower.transformer.layers.15.feed_forward.gate_proj.weight": "model-00001-of-00002.safetensors",
319
- "vision_tower.transformer.layers.15.feed_forward.up_proj.weight": "model-00001-of-00002.safetensors",
320
- "vision_tower.transformer.layers.15.ffn_norm.weight": "model-00001-of-00002.safetensors",
321
- "vision_tower.transformer.layers.16.attention.k_proj.weight": "model-00001-of-00002.safetensors",
322
- "vision_tower.transformer.layers.16.attention.o_proj.weight": "model-00001-of-00002.safetensors",
323
- "vision_tower.transformer.layers.16.attention.q_proj.weight": "model-00001-of-00002.safetensors",
324
- "vision_tower.transformer.layers.16.attention.v_proj.weight": "model-00001-of-00002.safetensors",
325
- "vision_tower.transformer.layers.16.attention_norm.weight": "model-00001-of-00002.safetensors",
326
- "vision_tower.transformer.layers.16.feed_forward.down_proj.weight": "model-00001-of-00002.safetensors",
327
- "vision_tower.transformer.layers.16.feed_forward.gate_proj.weight": "model-00001-of-00002.safetensors",
328
- "vision_tower.transformer.layers.16.feed_forward.up_proj.weight": "model-00001-of-00002.safetensors",
329
- "vision_tower.transformer.layers.16.ffn_norm.weight": "model-00001-of-00002.safetensors",
330
- "vision_tower.transformer.layers.17.attention.k_proj.weight": "model-00001-of-00002.safetensors",
331
- "vision_tower.transformer.layers.17.attention.o_proj.weight": "model-00001-of-00002.safetensors",
332
- "vision_tower.transformer.layers.17.attention.q_proj.weight": "model-00001-of-00002.safetensors",
333
- "vision_tower.transformer.layers.17.attention.v_proj.weight": "model-00001-of-00002.safetensors",
334
- "vision_tower.transformer.layers.17.attention_norm.weight": "model-00001-of-00002.safetensors",
335
- "vision_tower.transformer.layers.17.feed_forward.down_proj.weight": "model-00001-of-00002.safetensors",
336
- "vision_tower.transformer.layers.17.feed_forward.gate_proj.weight": "model-00001-of-00002.safetensors",
337
- "vision_tower.transformer.layers.17.feed_forward.up_proj.weight": "model-00001-of-00002.safetensors",
338
- "vision_tower.transformer.layers.17.ffn_norm.weight": "model-00001-of-00002.safetensors",
339
- "vision_tower.transformer.layers.18.attention.k_proj.weight": "model-00001-of-00002.safetensors",
340
- "vision_tower.transformer.layers.18.attention.o_proj.weight": "model-00001-of-00002.safetensors",
341
- "vision_tower.transformer.layers.18.attention.q_proj.weight": "model-00001-of-00002.safetensors",
342
- "vision_tower.transformer.layers.18.attention.v_proj.weight": "model-00001-of-00002.safetensors",
343
- "vision_tower.transformer.layers.18.attention_norm.weight": "model-00001-of-00002.safetensors",
344
- "vision_tower.transformer.layers.18.feed_forward.down_proj.weight": "model-00001-of-00002.safetensors",
345
- "vision_tower.transformer.layers.18.feed_forward.gate_proj.weight": "model-00001-of-00002.safetensors",
346
- "vision_tower.transformer.layers.18.feed_forward.up_proj.weight": "model-00001-of-00002.safetensors",
347
- "vision_tower.transformer.layers.18.ffn_norm.weight": "model-00001-of-00002.safetensors",
348
- "vision_tower.transformer.layers.19.attention.k_proj.weight": "model-00001-of-00002.safetensors",
349
- "vision_tower.transformer.layers.19.attention.o_proj.weight": "model-00001-of-00002.safetensors",
350
- "vision_tower.transformer.layers.19.attention.q_proj.weight": "model-00001-of-00002.safetensors",
351
- "vision_tower.transformer.layers.19.attention.v_proj.weight": "model-00001-of-00002.safetensors",
352
- "vision_tower.transformer.layers.19.attention_norm.weight": "model-00001-of-00002.safetensors",
353
- "vision_tower.transformer.layers.19.feed_forward.down_proj.weight": "model-00001-of-00002.safetensors",
354
- "vision_tower.transformer.layers.19.feed_forward.gate_proj.weight": "model-00001-of-00002.safetensors",
355
- "vision_tower.transformer.layers.19.feed_forward.up_proj.weight": "model-00001-of-00002.safetensors",
356
- "vision_tower.transformer.layers.19.ffn_norm.weight": "model-00001-of-00002.safetensors",
357
- "vision_tower.transformer.layers.2.attention.k_proj.weight": "model-00001-of-00002.safetensors",
358
- "vision_tower.transformer.layers.2.attention.o_proj.weight": "model-00001-of-00002.safetensors",
359
- "vision_tower.transformer.layers.2.attention.q_proj.weight": "model-00001-of-00002.safetensors",
360
- "vision_tower.transformer.layers.2.attention.v_proj.weight": "model-00001-of-00002.safetensors",
361
- "vision_tower.transformer.layers.2.attention_norm.weight": "model-00001-of-00002.safetensors",
362
- "vision_tower.transformer.layers.2.feed_forward.down_proj.weight": "model-00001-of-00002.safetensors",
363
- "vision_tower.transformer.layers.2.feed_forward.gate_proj.weight": "model-00001-of-00002.safetensors",
364
- "vision_tower.transformer.layers.2.feed_forward.up_proj.weight": "model-00001-of-00002.safetensors",
365
- "vision_tower.transformer.layers.2.ffn_norm.weight": "model-00001-of-00002.safetensors",
366
- "vision_tower.transformer.layers.20.attention.k_proj.weight": "model-00001-of-00002.safetensors",
367
- "vision_tower.transformer.layers.20.attention.o_proj.weight": "model-00001-of-00002.safetensors",
368
- "vision_tower.transformer.layers.20.attention.q_proj.weight": "model-00001-of-00002.safetensors",
369
- "vision_tower.transformer.layers.20.attention.v_proj.weight": "model-00001-of-00002.safetensors",
370
- "vision_tower.transformer.layers.20.attention_norm.weight": "model-00001-of-00002.safetensors",
371
- "vision_tower.transformer.layers.20.feed_forward.down_proj.weight": "model-00001-of-00002.safetensors",
372
- "vision_tower.transformer.layers.20.feed_forward.gate_proj.weight": "model-00001-of-00002.safetensors",
373
- "vision_tower.transformer.layers.20.feed_forward.up_proj.weight": "model-00001-of-00002.safetensors",
374
- "vision_tower.transformer.layers.20.ffn_norm.weight": "model-00001-of-00002.safetensors",
375
- "vision_tower.transformer.layers.21.attention.k_proj.weight": "model-00001-of-00002.safetensors",
376
- "vision_tower.transformer.layers.21.attention.o_proj.weight": "model-00001-of-00002.safetensors",
377
- "vision_tower.transformer.layers.21.attention.q_proj.weight": "model-00001-of-00002.safetensors",
378
- "vision_tower.transformer.layers.21.attention.v_proj.weight": "model-00001-of-00002.safetensors",
379
- "vision_tower.transformer.layers.21.attention_norm.weight": "model-00001-of-00002.safetensors",
380
- "vision_tower.transformer.layers.21.feed_forward.down_proj.weight": "model-00001-of-00002.safetensors",
381
- "vision_tower.transformer.layers.21.feed_forward.gate_proj.weight": "model-00001-of-00002.safetensors",
382
- "vision_tower.transformer.layers.21.feed_forward.up_proj.weight": "model-00001-of-00002.safetensors",
383
- "vision_tower.transformer.layers.21.ffn_norm.weight": "model-00001-of-00002.safetensors",
384
- "vision_tower.transformer.layers.22.attention.k_proj.weight": "model-00001-of-00002.safetensors",
385
- "vision_tower.transformer.layers.22.attention.o_proj.weight": "model-00001-of-00002.safetensors",
386
- "vision_tower.transformer.layers.22.attention.q_proj.weight": "model-00001-of-00002.safetensors",
387
- "vision_tower.transformer.layers.22.attention.v_proj.weight": "model-00001-of-00002.safetensors",
388
- "vision_tower.transformer.layers.22.attention_norm.weight": "model-00001-of-00002.safetensors",
389
- "vision_tower.transformer.layers.22.feed_forward.down_proj.weight": "model-00001-of-00002.safetensors",
390
- "vision_tower.transformer.layers.22.feed_forward.gate_proj.weight": "model-00001-of-00002.safetensors",
391
- "vision_tower.transformer.layers.22.feed_forward.up_proj.weight": "model-00001-of-00002.safetensors",
392
- "vision_tower.transformer.layers.22.ffn_norm.weight": "model-00001-of-00002.safetensors",
393
- "vision_tower.transformer.layers.23.attention.k_proj.weight": "model-00001-of-00002.safetensors",
394
- "vision_tower.transformer.layers.23.attention.o_proj.weight": "model-00001-of-00002.safetensors",
395
- "vision_tower.transformer.layers.23.attention.q_proj.weight": "model-00001-of-00002.safetensors",
396
- "vision_tower.transformer.layers.23.attention.v_proj.weight": "model-00001-of-00002.safetensors",
397
- "vision_tower.transformer.layers.23.attention_norm.weight": "model-00001-of-00002.safetensors",
398
- "vision_tower.transformer.layers.23.feed_forward.down_proj.weight": "model-00001-of-00002.safetensors",
399
- "vision_tower.transformer.layers.23.feed_forward.gate_proj.weight": "model-00001-of-00002.safetensors",
400
- "vision_tower.transformer.layers.23.feed_forward.up_proj.weight": "model-00001-of-00002.safetensors",
401
- "vision_tower.transformer.layers.23.ffn_norm.weight": "model-00001-of-00002.safetensors",
402
- "vision_tower.transformer.layers.3.attention.k_proj.weight": "model-00001-of-00002.safetensors",
403
- "vision_tower.transformer.layers.3.attention.o_proj.weight": "model-00001-of-00002.safetensors",
404
- "vision_tower.transformer.layers.3.attention.q_proj.weight": "model-00001-of-00002.safetensors",
405
- "vision_tower.transformer.layers.3.attention.v_proj.weight": "model-00001-of-00002.safetensors",
406
- "vision_tower.transformer.layers.3.attention_norm.weight": "model-00001-of-00002.safetensors",
407
- "vision_tower.transformer.layers.3.feed_forward.down_proj.weight": "model-00001-of-00002.safetensors",
408
- "vision_tower.transformer.layers.3.feed_forward.gate_proj.weight": "model-00001-of-00002.safetensors",
409
- "vision_tower.transformer.layers.3.feed_forward.up_proj.weight": "model-00001-of-00002.safetensors",
410
- "vision_tower.transformer.layers.3.ffn_norm.weight": "model-00001-of-00002.safetensors",
411
- "vision_tower.transformer.layers.4.attention.k_proj.weight": "model-00001-of-00002.safetensors",
412
- "vision_tower.transformer.layers.4.attention.o_proj.weight": "model-00001-of-00002.safetensors",
413
- "vision_tower.transformer.layers.4.attention.q_proj.weight": "model-00001-of-00002.safetensors",
414
- "vision_tower.transformer.layers.4.attention.v_proj.weight": "model-00001-of-00002.safetensors",
415
- "vision_tower.transformer.layers.4.attention_norm.weight": "model-00001-of-00002.safetensors",
416
- "vision_tower.transformer.layers.4.feed_forward.down_proj.weight": "model-00001-of-00002.safetensors",
417
- "vision_tower.transformer.layers.4.feed_forward.gate_proj.weight": "model-00001-of-00002.safetensors",
418
- "vision_tower.transformer.layers.4.feed_forward.up_proj.weight": "model-00001-of-00002.safetensors",
419
- "vision_tower.transformer.layers.4.ffn_norm.weight": "model-00001-of-00002.safetensors",
420
- "vision_tower.transformer.layers.5.attention.k_proj.weight": "model-00001-of-00002.safetensors",
421
- "vision_tower.transformer.layers.5.attention.o_proj.weight": "model-00001-of-00002.safetensors",
422
- "vision_tower.transformer.layers.5.attention.q_proj.weight": "model-00001-of-00002.safetensors",
423
- "vision_tower.transformer.layers.5.attention.v_proj.weight": "model-00001-of-00002.safetensors",
424
- "vision_tower.transformer.layers.5.attention_norm.weight": "model-00001-of-00002.safetensors",
425
- "vision_tower.transformer.layers.5.feed_forward.down_proj.weight": "model-00001-of-00002.safetensors",
426
- "vision_tower.transformer.layers.5.feed_forward.gate_proj.weight": "model-00001-of-00002.safetensors",
427
- "vision_tower.transformer.layers.5.feed_forward.up_proj.weight": "model-00001-of-00002.safetensors",
428
- "vision_tower.transformer.layers.5.ffn_norm.weight": "model-00001-of-00002.safetensors",
429
- "vision_tower.transformer.layers.6.attention.k_proj.weight": "model-00001-of-00002.safetensors",
430
- "vision_tower.transformer.layers.6.attention.o_proj.weight": "model-00001-of-00002.safetensors",
431
- "vision_tower.transformer.layers.6.attention.q_proj.weight": "model-00001-of-00002.safetensors",
432
- "vision_tower.transformer.layers.6.attention.v_proj.weight": "model-00001-of-00002.safetensors",
433
- "vision_tower.transformer.layers.6.attention_norm.weight": "model-00001-of-00002.safetensors",
434
- "vision_tower.transformer.layers.6.feed_forward.down_proj.weight": "model-00001-of-00002.safetensors",
435
- "vision_tower.transformer.layers.6.feed_forward.gate_proj.weight": "model-00001-of-00002.safetensors",
436
- "vision_tower.transformer.layers.6.feed_forward.up_proj.weight": "model-00001-of-00002.safetensors",
437
- "vision_tower.transformer.layers.6.ffn_norm.weight": "model-00001-of-00002.safetensors",
438
- "vision_tower.transformer.layers.7.attention.k_proj.weight": "model-00001-of-00002.safetensors",
439
- "vision_tower.transformer.layers.7.attention.o_proj.weight": "model-00001-of-00002.safetensors",
440
- "vision_tower.transformer.layers.7.attention.q_proj.weight": "model-00001-of-00002.safetensors",
441
- "vision_tower.transformer.layers.7.attention.v_proj.weight": "model-00001-of-00002.safetensors",
442
- "vision_tower.transformer.layers.7.attention_norm.weight": "model-00001-of-00002.safetensors",
443
- "vision_tower.transformer.layers.7.feed_forward.down_proj.weight": "model-00001-of-00002.safetensors",
444
- "vision_tower.transformer.layers.7.feed_forward.gate_proj.weight": "model-00001-of-00002.safetensors",
445
- "vision_tower.transformer.layers.7.feed_forward.up_proj.weight": "model-00001-of-00002.safetensors",
446
- "vision_tower.transformer.layers.7.ffn_norm.weight": "model-00001-of-00002.safetensors",
447
- "vision_tower.transformer.layers.8.attention.k_proj.weight": "model-00001-of-00002.safetensors",
448
- "vision_tower.transformer.layers.8.attention.o_proj.weight": "model-00001-of-00002.safetensors",
449
- "vision_tower.transformer.layers.8.attention.q_proj.weight": "model-00001-of-00002.safetensors",
450
- "vision_tower.transformer.layers.8.attention.v_proj.weight": "model-00001-of-00002.safetensors",
451
- "vision_tower.transformer.layers.8.attention_norm.weight": "model-00001-of-00002.safetensors",
452
- "vision_tower.transformer.layers.8.feed_forward.down_proj.weight": "model-00001-of-00002.safetensors",
453
- "vision_tower.transformer.layers.8.feed_forward.gate_proj.weight": "model-00001-of-00002.safetensors",
454
- "vision_tower.transformer.layers.8.feed_forward.up_proj.weight": "model-00001-of-00002.safetensors",
455
- "vision_tower.transformer.layers.8.ffn_norm.weight": "model-00001-of-00002.safetensors",
456
- "vision_tower.transformer.layers.9.attention.k_proj.weight": "model-00001-of-00002.safetensors",
457
- "vision_tower.transformer.layers.9.attention.o_proj.weight": "model-00001-of-00002.safetensors",
458
- "vision_tower.transformer.layers.9.attention.q_proj.weight": "model-00001-of-00002.safetensors",
459
- "vision_tower.transformer.layers.9.attention.v_proj.weight": "model-00001-of-00002.safetensors",
460
- "vision_tower.transformer.layers.9.attention_norm.weight": "model-00001-of-00002.safetensors",
461
- "vision_tower.transformer.layers.9.feed_forward.down_proj.weight": "model-00001-of-00002.safetensors",
462
- "vision_tower.transformer.layers.9.feed_forward.gate_proj.weight": "model-00001-of-00002.safetensors",
463
- "vision_tower.transformer.layers.9.feed_forward.up_proj.weight": "model-00001-of-00002.safetensors",
464
- "vision_tower.transformer.layers.9.ffn_norm.weight": "model-00001-of-00002.safetensors"
465
- }
466
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
params.json CHANGED
@@ -19,6 +19,10 @@
19
  "qk_nope_head_dim": null,
20
  "kv_lora_rank": null,
21
  "v_head_dim": null,
 
 
 
 
22
  "yarn": {
23
  "original_max_position_embeddings": 16384,
24
  "factor": 16,
 
19
  "qk_nope_head_dim": null,
20
  "kv_lora_rank": null,
21
  "v_head_dim": null,
22
+ "quantization": {
23
+ "qformat_weight": "fp8_e4m3",
24
+ "qscheme_act": "TENSOR"
25
+ },
26
  "yarn": {
27
  "original_max_position_embeddings": 16384,
28
  "factor": 16,
tokenizer.json CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:577575622324b2e099e2648be26bdeb5e5815ffe66d7004e9e3ddbf421db6bf1
3
- size 17078110
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:286acad9b0e27fce778ac429763536accf618ccb6ed72963b6f94685e531c5c7
3
+ size 17077402
tokenizer_config.json CHANGED
The diff for this file is too large to render. See raw diff