danielhanchen commited on
Commit
2d73dd1
·
verified ·
1 Parent(s): 7868981

Add files using upload-large-folder tool

Browse files
README.md CHANGED
@@ -16,20 +16,20 @@ license: apache-2.0
16
  inference: false
17
  base_model:
18
  - mistralai/Ministral-3-3B-Instruct-2512
19
- extra_gated_description: >-
20
- If you want to learn more about how we process your personal data, please read
21
- our <a href="https://mistral.ai/terms/">Privacy Policy</a>.
22
  tags:
23
  - mistral-common
 
24
  - unsloth
25
  ---
26
- > [!NOTE]
27
- > Includes Unsloth **chat template fixes**! <br> For `llama.cpp`, use `--jinja`
28
- >
29
-
30
  <div>
 
 
 
 
 
 
31
  <p style="margin-top: 0;margin-bottom: 0;">
32
- <em><a href="https://docs.unsloth.ai/basics/unsloth-dynamic-v2.0-gguf">Unsloth Dynamic 2.0</a> achieves superior accuracy & outperforms other leading quants.</em>
33
  </p>
34
  <div style="display: flex; gap: 5px; align-items: center; ">
35
  <a href="https://github.com/unslothai/unsloth/">
@@ -38,22 +38,23 @@ tags:
38
  <a href="https://discord.gg/unsloth">
39
  <img src="https://github.com/unslothai/unsloth/raw/main/images/Discord%20button.png" width="173">
40
  </a>
41
- <a href="https://docs.unsloth.ai/">
42
  <img src="https://raw.githubusercontent.com/unslothai/unsloth/refs/heads/main/images/documentation%20green%20button.png" width="143">
43
  </a>
44
  </div>
 
45
  </div>
46
 
 
 
 
 
47
 
48
  # Ministral 3 3B Instruct 2512
49
  The smallest model in the Ministral 3 family, **Ministral 3 3B** is a powerful, efficient tiny language model with vision capabilities.
50
 
51
- This model is the instruct post-trained version in **FP8**, fine-tuned for instruction tasks, making it ideal for chat and instruction based use cases.
52
-
53
  The Ministral 3 family is designed for edge deployment, capable of running on a wide range of hardware. Ministral 3 3B can even be deployed locally, capable of fitting in 8GB of VRAM in FP8, and less if further quantized.
54
 
55
- Learn more in our blog post [here](https://mistral.ai/news/mistral-3).
56
-
57
  ## Key Features
58
  Ministral 3 3B consists of two main architectural components:
59
  - **3.4B Language Model**
@@ -80,24 +81,12 @@ Ideal for lightweight, real-time applications on edge or low-resource devices, s
80
 
81
  Bringing advanced AI capabilities to edge and distributed environments for embedded systems.
82
 
83
- ### Recommended Settings
84
-
85
- We recommend deploying with the following best practices:
86
- - System Prompt: Define a clear environment and use case, including guidance on how to effectively leverage tools in agentic systems.
87
- - Sampling Parameters: Use a **temperature below 0.1** for daily-driver and production environments ; Higher temperatures may be explored for creative use cases - developers are encouraged to experiment with alternative settings.
88
- - Tools: Keep the set of tools well-defined and limit their number to the minimum required for the use case - Avoiding overloading the model with an excessive number of tools.
89
- - Vision: When deploying with vision capabilities, we recommend maintaining an aspect ratio close to 1:1 (width-to-height) for images. Avoiding the use of overly thin or wide images - crop them as needed to ensure optimal performance.
90
-
91
- ### Recommended Sampling
92
-
93
- * We recommend starting with a Temperature of 0.1 for most use cases. Feel free to experiment with different settings to best suit your specific needs.
94
-
95
  ## Ministral 3 Family
96
 
97
  | Model Name | Type | Precision | Link |
98
  |--------------------------------|--------------------|-----------|------------------------------------------------------------------------------------------|
99
  | Ministral 3 3B Base 2512 | Base pre-trained | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-3B-Base-2512) |
100
- | **Ministral 3 3B Instruct 2512** | **Instruct post-trained** | **FP8** | [**Hugging Face**](https://huggingface.co/mistralai/Ministral-3-3B-Instruct-2512) |
101
  | Ministral 3 3B Reasoning 2512 | Reasoning capable | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-3B-Reasoning-2512) |
102
  | Ministral 3 8B Base 2512 | Base pre-trained | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-8B-Base-2512) |
103
  | Ministral 3 8B Instruct 2512 | Instruct post-trained | FP8 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-8B-Instruct-2512) |
@@ -106,7 +95,7 @@ We recommend deploying with the following best practices:
106
  | Ministral 3 14B Instruct 2512 | Instruct post-trained | FP8 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-14B-Instruct-2512) |
107
  | Ministral 3 14B Reasoning 2512 | Reasoning capable | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-14B-Reasoning-2512) |
108
 
109
- Other formats available [here](https://huggingface.co/collections/mistralai/ministral-3-additional-checkpoints).
110
 
111
  ## Benchmark Results
112
 
@@ -168,7 +157,7 @@ We recommend using this model with [vLLM](https://github.com/vllm-project/vllm).
168
 
169
  #### Installation
170
 
171
- Make sure to install **vllm >= 1.12.0**:
172
 
173
  ```
174
  pip install vllm --upgrade
@@ -181,7 +170,7 @@ To check:
181
  python -c "import mistral_common; print(mistral_common.__version__)"
182
  ```
183
 
184
- You can also make use of a ready-to-go [docker image](https://github.com/vllm-project/vllm/blob/main/docker/Dockerfile) or on the [docker hub](https://hub.docker.com/layers/vllm/vllm-openai/latest).
185
 
186
  #### Serve
187
 
@@ -191,7 +180,6 @@ A simple launch command is:
191
 
192
  ```bash
193
  vllm serve mistralai/Ministral-3-3B-Instruct-2512 \
194
- --tokenizer_mode mistral --config_format mistral --load_format mistral \
195
  --enable-auto-tool-choice --tool-call-parser mistral
196
  ```
197
 
@@ -205,10 +193,10 @@ Additional flags:
205
 
206
  * You can set `--max-model-len` to preserve memory. By default it is set to `262144` which is quite large but not necessary for most scenarios.
207
  * You can set `--max-num-batched-tokens` to balance throughput and latency, higher means higher throughput but higher latency.
208
-
209
  #### Usage of the model
210
 
211
- Here we assume that the model `mistralai/Ministral-3-3B-Instruct-2512` is served and you can ping it to the domain `localhost` with the port `8000` which is the default for vLLM.
212
 
213
  <details>
214
  <summary>Vision Reasoning</summary>
@@ -264,6 +252,8 @@ messages = [
264
  },
265
  ]
266
 
 
 
267
 
268
  response = client.chat.completions.create(
269
  model=model,
@@ -476,7 +466,7 @@ print(assistant_message)
476
 
477
  You can also use Ministral 3 3B Instruct 2512 with `Transformers` !
478
 
479
- Transformers recently added support for FP8, so make sure to install from main:
480
 
481
  ```sh
482
  uv pip install git+https://github.com/huggingface/transformers
@@ -491,11 +481,10 @@ pip install mistral-common --upgrade
491
  Try it out by running the following snippet.
492
 
493
  > [!Tip]
494
- > On latest main as of 05/12/2025, by default
495
- > a FP8 triton kernel for fast accelerated matmuls
496
- > (`w8a8_block_fp8_matmul_triton`) will be used
497
- > without any degradation in accuracy. However, if you want to
498
- > run your model in BF16 see ([here](#transformers-bf16))
499
 
500
  <details>
501
  <summary>Python snippet</summary>
@@ -540,11 +529,9 @@ decoded_output = tokenizer.decode(output[len(tokenized["input_ids"][0]):])
540
  print(decoded_output)
541
  ```
542
 
543
- </details>
544
-
545
- #### Transformers BF16
546
 
547
- Transformers allows you to automatically convert the checkpoint to Bfloat16. To do so, simply load the model as follows:
548
 
549
  ```py
550
  from transformers import Mistral3ForConditionalGeneration, FineGrainedFP8Config
@@ -557,6 +544,8 @@ model = Mistral3ForConditionalGeneration.from_pretrained(
557
  )
558
  ```
559
 
 
 
560
  ## License
561
 
562
  This model is licensed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0.txt).
 
16
  inference: false
17
  base_model:
18
  - mistralai/Ministral-3-3B-Instruct-2512
 
 
 
19
  tags:
20
  - mistral-common
21
+ - mistral
22
  - unsloth
23
  ---
 
 
 
 
24
  <div>
25
+ <p style="margin-bottom: 0; margin-top: 0;">
26
+ <strong>See our <a href="https://huggingface.co/collections/unsloth/ministral-3">Ministral 3 collection</a> for all versions including GGUF, 4-bit & FP8 formats.</strong>
27
+ </p>
28
+ <p style="margin-bottom: 0;">
29
+ <em>Learn to run Ministral correctly - <a href="https://docs.unsloth.ai/new/ministral-3">Read our Guide</a>.</em>
30
+ </p>
31
  <p style="margin-top: 0;margin-bottom: 0;">
32
+ <em>See <a href="https://docs.unsloth.ai/basics/unsloth-dynamic-v2.0-gguf">Unsloth Dynamic 2.0 GGUFs</a> for our quantization benchmarks.</em>
33
  </p>
34
  <div style="display: flex; gap: 5px; align-items: center; ">
35
  <a href="https://github.com/unslothai/unsloth/">
 
38
  <a href="https://discord.gg/unsloth">
39
  <img src="https://github.com/unslothai/unsloth/raw/main/images/Discord%20button.png" width="173">
40
  </a>
41
+ <a href="https://docs.unsloth.ai/new/ministral-3">
42
  <img src="https://raw.githubusercontent.com/unslothai/unsloth/refs/heads/main/images/documentation%20green%20button.png" width="143">
43
  </a>
44
  </div>
45
+ <h1 style="margin-top: 0rem;">✨ Read our Ministral 3 Guide <a href="https://docs.unsloth.ai/new/ministral-3">here</a>!</h1>
46
  </div>
47
 
48
+ - Fine-tune Ministral 3 for free using our [Google Colab notebook](https://docs.unsloth.ai/new/ministral-3#fine-tuning)
49
+ - Or train Ministral 3 with reinforcement learning (GSPO) with our [free notebook](https://docs.unsloth.ai/new/ministral-3#reinforcement-learning-grpo).
50
+ - View the rest of our notebooks in our [docs here](https://docs.unsloth.ai/get-started/unsloth-notebooks).
51
+ ---
52
 
53
  # Ministral 3 3B Instruct 2512
54
  The smallest model in the Ministral 3 family, **Ministral 3 3B** is a powerful, efficient tiny language model with vision capabilities.
55
 
 
 
56
  The Ministral 3 family is designed for edge deployment, capable of running on a wide range of hardware. Ministral 3 3B can even be deployed locally, capable of fitting in 8GB of VRAM in FP8, and less if further quantized.
57
 
 
 
58
  ## Key Features
59
  Ministral 3 3B consists of two main architectural components:
60
  - **3.4B Language Model**
 
81
 
82
  Bringing advanced AI capabilities to edge and distributed environments for embedded systems.
83
 
 
 
 
 
 
 
 
 
 
 
 
 
84
  ## Ministral 3 Family
85
 
86
  | Model Name | Type | Precision | Link |
87
  |--------------------------------|--------------------|-----------|------------------------------------------------------------------------------------------|
88
  | Ministral 3 3B Base 2512 | Base pre-trained | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-3B-Base-2512) |
89
+ | Ministral 3 3B Instruct 2512 | Instruct post-trained | FP8 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-3B-Instruct-2512) |
90
  | Ministral 3 3B Reasoning 2512 | Reasoning capable | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-3B-Reasoning-2512) |
91
  | Ministral 3 8B Base 2512 | Base pre-trained | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-8B-Base-2512) |
92
  | Ministral 3 8B Instruct 2512 | Instruct post-trained | FP8 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-8B-Instruct-2512) |
 
95
  | Ministral 3 14B Instruct 2512 | Instruct post-trained | FP8 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-14B-Instruct-2512) |
96
  | Ministral 3 14B Reasoning 2512 | Reasoning capable | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-14B-Reasoning-2512) |
97
 
98
+ Other formats available [here](https://huggingface.co/collections/mistralai/ministral-3-more).
99
 
100
  ## Benchmark Results
101
 
 
157
 
158
  #### Installation
159
 
160
+ Make sure to install [`vLLM >= 0.12.0`](https://github.com/vllm-project/vllm/releases/tag/v0.12.0):
161
 
162
  ```
163
  pip install vllm --upgrade
 
170
  python -c "import mistral_common; print(mistral_common.__version__)"
171
  ```
172
 
173
+ You can also make use of a ready-to-go [docker image](https://github.com/vllm-project/vllm/blob/main/Dockerfile) or on the [docker hub](https://hub.docker.com/layers/vllm/vllm-openai/latest/images/sha256-de9032a92ffea7b5c007dad80b38fd44aac11eddc31c435f8e52f3b7404bbf39).
174
 
175
  #### Serve
176
 
 
180
 
181
  ```bash
182
  vllm serve mistralai/Ministral-3-3B-Instruct-2512 \
 
183
  --enable-auto-tool-choice --tool-call-parser mistral
184
  ```
185
 
 
193
 
194
  * You can set `--max-model-len` to preserve memory. By default it is set to `262144` which is quite large but not necessary for most scenarios.
195
  * You can set `--max-num-batched-tokens` to balance throughput and latency, higher means higher throughput but higher latency.
196
+
197
  #### Usage of the model
198
 
199
+ Here we asumme that the model `mistralai/Ministral-3-3B-Instruct-2512` is served and you can ping it to the domain `localhost` with the port `8000` which is the default for vLLM.
200
 
201
  <details>
202
  <summary>Vision Reasoning</summary>
 
252
  },
253
  ]
254
 
255
+ print(messages)
256
+
257
 
258
  response = client.chat.completions.create(
259
  model=model,
 
466
 
467
  You can also use Ministral 3 3B Instruct 2512 with `Transformers` !
468
 
469
+ Transformers very recently added prelimenary support for FP8, so please make sure to install from main:
470
 
471
  ```sh
472
  uv pip install git+https://github.com/huggingface/transformers
 
481
  Try it out by running the following snippet.
482
 
483
  > [!Tip]
484
+ > By default Transformers will load the checkpoint in FP8 and dequantize it to BF16 on the fly,
485
+ > which means the model currently does not make use of accelerated FP8-kernels.
486
+ > Compatibility with accelerated FP8-kernels is currently worked on and will be available in a couple of weeks.
487
+ > Stay tuned!
 
488
 
489
  <details>
490
  <summary>Python snippet</summary>
 
529
  print(decoded_output)
530
  ```
531
 
532
+ **Note:**
 
 
533
 
534
+ Transformers allows you to automatically convert the checkpoint to Bfloat16. To so simple load the model as follows:
535
 
536
  ```py
537
  from transformers import Mistral3ForConditionalGeneration, FineGrainedFP8Config
 
544
  )
545
  ```
546
 
547
+ </details>
548
+
549
  ## License
550
 
551
  This model is licensed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0.txt).
chat_template.jinja CHANGED
@@ -1,3 +1,4 @@
 
1
  {#- Default system message if no system prompt is passed. #}
2
  {%- set default_system_message = 'You are Ministral-3-3B-Instruct-2512, a Large Language Model (LLM) created by Mistral AI, a French startup headquartered in Paris.\nYou power an AI assistant called Le Chat.\nYour knowledge base was last updated on 2023-10-01.\nThe current date is {today}.\n\nWhen you\'re not sure about some information or when the user\'s request requires up-to-date or specific data, you must use the available tools to fetch the information. Do not hesitate to use tools whenever they can provide a more accurate or complete response. If no relevant tools are available, then clearly state that you don\'t have the information and avoid making up anything.\nIf the user\'s question is not clear, ambiguous, or does not provide enough context for you to accurately answer the question, you do not try to answer it right away and you rather ask the user to clarify their request (e.g. "What are some good restaurants around me?" => "Where are you?" or "When is the next flight to Tokyo" => "Where do you travel from?").\nYou are always very attentive to dates, in particular you try to resolve dates (e.g. "yesterday" is {yesterday}) and when asked about information at specific dates, you discard information that is at another date.\nYou follow these instructions in all languages, and always respond to the user in the language they use or request.\nNext sections describe the capabilities that you have.\n\n# WEB BROWSING INSTRUCTIONS\n\nYou cannot perform any web search or access internet to open URLs, links etc. If it seems like the user is expecting you to do so, you clarify the situation and ask the user to copy paste the text directly in the chat.\n\n# MULTI-MODAL INSTRUCTIONS\n\nYou have the ability to read images, but you cannot generate images. You also cannot transcribe audio files or videos.\nYou cannot read nor transcribe audio files or videos.\n\n# TOOL CALLING INSTRUCTIONS\n\nYou may have access to tools that you can use to fetch information or perform actions. You must use these tools in the following situations:\n\n1. When the request requires up-to-date information.\n2. When the request requires specific data that you do not have in your knowledge base.\n3. When the request involves actions that you cannot perform without tools.\n\nAlways prioritize using tools to provide the most accurate and helpful response. If tools are not available, inform the user that you cannot perform the requested action at the moment.' %}
3
 
@@ -79,13 +80,10 @@
79
 
80
  {#- Assistant messages supports text content or text and image chunks. #}
81
  {%- elif message['role'] == 'assistant' %}
82
- {%- if (message['content'] is none or message['content'] == '' or message['content']|length == 0) and (message['tool_calls'] is not defined or message['tool_calls'] is none or message['tool_calls']|length == 0) %}
83
- {{- raise_exception('Assistant message must have a string or a list of chunks in content or a list of tool calls.') }}
84
- {%- endif %}
85
 
86
  {%- if message['content'] is string %}
87
  {{- message['content'] }}
88
- {%- elif message['content'] | length > 0 %}
89
  {%- for block in message['content'] %}
90
  {%- if block['type'] == 'text' %}
91
  {{- block['text'] }}
@@ -116,6 +114,8 @@
116
 
117
  {#- Raise exception for unsupported roles. #}
118
  {%- else %}
119
- {{- raise_exception('Only user, assistant and tool roles are supported, got ' + message['role'] + '.') }}
120
  {%- endif %}
121
  {%- endfor %}
 
 
 
1
+ {#- Unsloth template fixes #}
2
  {#- Default system message if no system prompt is passed. #}
3
  {%- set default_system_message = 'You are Ministral-3-3B-Instruct-2512, a Large Language Model (LLM) created by Mistral AI, a French startup headquartered in Paris.\nYou power an AI assistant called Le Chat.\nYour knowledge base was last updated on 2023-10-01.\nThe current date is {today}.\n\nWhen you\'re not sure about some information or when the user\'s request requires up-to-date or specific data, you must use the available tools to fetch the information. Do not hesitate to use tools whenever they can provide a more accurate or complete response. If no relevant tools are available, then clearly state that you don\'t have the information and avoid making up anything.\nIf the user\'s question is not clear, ambiguous, or does not provide enough context for you to accurately answer the question, you do not try to answer it right away and you rather ask the user to clarify their request (e.g. "What are some good restaurants around me?" => "Where are you?" or "When is the next flight to Tokyo" => "Where do you travel from?").\nYou are always very attentive to dates, in particular you try to resolve dates (e.g. "yesterday" is {yesterday}) and when asked about information at specific dates, you discard information that is at another date.\nYou follow these instructions in all languages, and always respond to the user in the language they use or request.\nNext sections describe the capabilities that you have.\n\n# WEB BROWSING INSTRUCTIONS\n\nYou cannot perform any web search or access internet to open URLs, links etc. If it seems like the user is expecting you to do so, you clarify the situation and ask the user to copy paste the text directly in the chat.\n\n# MULTI-MODAL INSTRUCTIONS\n\nYou have the ability to read images, but you cannot generate images. You also cannot transcribe audio files or videos.\nYou cannot read nor transcribe audio files or videos.\n\n# TOOL CALLING INSTRUCTIONS\n\nYou may have access to tools that you can use to fetch information or perform actions. You must use these tools in the following situations:\n\n1. When the request requires up-to-date information.\n2. When the request requires specific data that you do not have in your knowledge base.\n3. When the request involves actions that you cannot perform without tools.\n\nAlways prioritize using tools to provide the most accurate and helpful response. If tools are not available, inform the user that you cannot perform the requested action at the moment.' %}
4
 
 
80
 
81
  {#- Assistant messages supports text content or text and image chunks. #}
82
  {%- elif message['role'] == 'assistant' %}
 
 
 
83
 
84
  {%- if message['content'] is string %}
85
  {{- message['content'] }}
86
+ {%- elif message['content'] is iterable and message['content'] | length > 0 %}
87
  {%- for block in message['content'] %}
88
  {%- if block['type'] == 'text' %}
89
  {{- block['text'] }}
 
114
 
115
  {#- Raise exception for unsupported roles. #}
116
  {%- else %}
117
+ {{- raise_exception('Only user, assistant and tool roles are supported, got ' + message['role']) }}
118
  {%- endif %}
119
  {%- endfor %}
120
+
121
+ {#- Copyright 2025-present Unsloth. Apache 2.0 License. #}
config.json CHANGED
@@ -2,29 +2,18 @@
2
  "architectures": [
3
  "Mistral3ForConditionalGeneration"
4
  ],
 
5
  "torch_dtype": "bfloat16",
 
6
  "image_token_index": 10,
7
  "model_type": "mistral3",
8
  "multimodal_projector_bias": false,
9
  "pad_token_id": 11,
10
  "projector_hidden_act": "gelu",
11
- "quantization_config": {
12
- "activation_scheme": "static",
13
- "dequantize": false,
14
- "modules_to_not_convert": [
15
- "model.vision_tower",
16
- "model.multi_modal_projector",
17
- "lm_head",
18
- "model.vision_tower",
19
- "model.multi_modal_projector",
20
- "lm_head"
21
- ],
22
- "quant_method": "fp8",
23
- "weight_block_size": null
24
- },
25
  "spatial_merge_size": 2,
26
  "text_config": {
27
  "attention_dropout": 0.0,
 
28
  "head_dim": 128,
29
  "hidden_act": "silu",
30
  "hidden_size": 3072,
@@ -57,6 +46,7 @@
57
  "unsloth_fixed": true,
58
  "vision_config": {
59
  "attention_dropout": 0.0,
 
60
  "head_dim": 64,
61
  "hidden_act": "silu",
62
  "hidden_size": 1024,
 
2
  "architectures": [
3
  "Mistral3ForConditionalGeneration"
4
  ],
5
+ "bos_token_id": 1,
6
  "torch_dtype": "bfloat16",
7
+ "eos_token_id": 2,
8
  "image_token_index": 10,
9
  "model_type": "mistral3",
10
  "multimodal_projector_bias": false,
11
  "pad_token_id": 11,
12
  "projector_hidden_act": "gelu",
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  "spatial_merge_size": 2,
14
  "text_config": {
15
  "attention_dropout": 0.0,
16
+ "torch_dtype": "bfloat16",
17
  "head_dim": 128,
18
  "hidden_act": "silu",
19
  "hidden_size": 3072,
 
46
  "unsloth_fixed": true,
47
  "vision_config": {
48
  "attention_dropout": 0.0,
49
+ "torch_dtype": "bfloat16",
50
  "head_dim": 64,
51
  "hidden_act": "silu",
52
  "hidden_size": 1024,
model-00001-of-00002.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b3821ebc30884f66e3d26d339e161641f34b91ed916627011e7b08e5f1edd884
3
+ size 4967581832
model-00002-of-00002.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:718d087fa591fd4356b7241f293c24219399d86f092d46cf36f765051498033a
3
+ size 2730659224
model.safetensors.index.json ADDED
@@ -0,0 +1,466 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_parameters": 4251743232,
4
+ "total_size": 7698180096
5
+ },
6
+ "weight_map": {
7
+ "language_model.model.embed_tokens.weight": "model-00001-of-00002.safetensors",
8
+ "language_model.model.layers.0.input_layernorm.weight": "model-00001-of-00002.safetensors",
9
+ "language_model.model.layers.0.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
10
+ "language_model.model.layers.0.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
11
+ "language_model.model.layers.0.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
12
+ "language_model.model.layers.0.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
13
+ "language_model.model.layers.0.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
14
+ "language_model.model.layers.0.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
15
+ "language_model.model.layers.0.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
16
+ "language_model.model.layers.0.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
17
+ "language_model.model.layers.1.input_layernorm.weight": "model-00001-of-00002.safetensors",
18
+ "language_model.model.layers.1.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
19
+ "language_model.model.layers.1.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
20
+ "language_model.model.layers.1.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
21
+ "language_model.model.layers.1.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
22
+ "language_model.model.layers.1.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
23
+ "language_model.model.layers.1.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
24
+ "language_model.model.layers.1.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
25
+ "language_model.model.layers.1.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
26
+ "language_model.model.layers.10.input_layernorm.weight": "model-00001-of-00002.safetensors",
27
+ "language_model.model.layers.10.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
28
+ "language_model.model.layers.10.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
29
+ "language_model.model.layers.10.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
30
+ "language_model.model.layers.10.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
31
+ "language_model.model.layers.10.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
32
+ "language_model.model.layers.10.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
33
+ "language_model.model.layers.10.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
34
+ "language_model.model.layers.10.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
35
+ "language_model.model.layers.11.input_layernorm.weight": "model-00001-of-00002.safetensors",
36
+ "language_model.model.layers.11.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
37
+ "language_model.model.layers.11.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
38
+ "language_model.model.layers.11.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
39
+ "language_model.model.layers.11.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
40
+ "language_model.model.layers.11.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
41
+ "language_model.model.layers.11.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
42
+ "language_model.model.layers.11.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
43
+ "language_model.model.layers.11.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
44
+ "language_model.model.layers.12.input_layernorm.weight": "model-00001-of-00002.safetensors",
45
+ "language_model.model.layers.12.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
46
+ "language_model.model.layers.12.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
47
+ "language_model.model.layers.12.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
48
+ "language_model.model.layers.12.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
49
+ "language_model.model.layers.12.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
50
+ "language_model.model.layers.12.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
51
+ "language_model.model.layers.12.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
52
+ "language_model.model.layers.12.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
53
+ "language_model.model.layers.13.input_layernorm.weight": "model-00001-of-00002.safetensors",
54
+ "language_model.model.layers.13.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
55
+ "language_model.model.layers.13.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
56
+ "language_model.model.layers.13.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
57
+ "language_model.model.layers.13.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
58
+ "language_model.model.layers.13.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
59
+ "language_model.model.layers.13.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
60
+ "language_model.model.layers.13.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
61
+ "language_model.model.layers.13.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
62
+ "language_model.model.layers.14.input_layernorm.weight": "model-00002-of-00002.safetensors",
63
+ "language_model.model.layers.14.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
64
+ "language_model.model.layers.14.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
65
+ "language_model.model.layers.14.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
66
+ "language_model.model.layers.14.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
67
+ "language_model.model.layers.14.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
68
+ "language_model.model.layers.14.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
69
+ "language_model.model.layers.14.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
70
+ "language_model.model.layers.14.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
71
+ "language_model.model.layers.15.input_layernorm.weight": "model-00002-of-00002.safetensors",
72
+ "language_model.model.layers.15.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
73
+ "language_model.model.layers.15.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
74
+ "language_model.model.layers.15.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
75
+ "language_model.model.layers.15.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
76
+ "language_model.model.layers.15.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
77
+ "language_model.model.layers.15.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
78
+ "language_model.model.layers.15.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
79
+ "language_model.model.layers.15.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
80
+ "language_model.model.layers.16.input_layernorm.weight": "model-00002-of-00002.safetensors",
81
+ "language_model.model.layers.16.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
82
+ "language_model.model.layers.16.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
83
+ "language_model.model.layers.16.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
84
+ "language_model.model.layers.16.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
85
+ "language_model.model.layers.16.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
86
+ "language_model.model.layers.16.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
87
+ "language_model.model.layers.16.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
88
+ "language_model.model.layers.16.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
89
+ "language_model.model.layers.17.input_layernorm.weight": "model-00002-of-00002.safetensors",
90
+ "language_model.model.layers.17.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
91
+ "language_model.model.layers.17.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
92
+ "language_model.model.layers.17.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
93
+ "language_model.model.layers.17.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
94
+ "language_model.model.layers.17.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
95
+ "language_model.model.layers.17.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
96
+ "language_model.model.layers.17.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
97
+ "language_model.model.layers.17.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
98
+ "language_model.model.layers.18.input_layernorm.weight": "model-00002-of-00002.safetensors",
99
+ "language_model.model.layers.18.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
100
+ "language_model.model.layers.18.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
101
+ "language_model.model.layers.18.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
102
+ "language_model.model.layers.18.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
103
+ "language_model.model.layers.18.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
104
+ "language_model.model.layers.18.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
105
+ "language_model.model.layers.18.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
106
+ "language_model.model.layers.18.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
107
+ "language_model.model.layers.19.input_layernorm.weight": "model-00002-of-00002.safetensors",
108
+ "language_model.model.layers.19.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
109
+ "language_model.model.layers.19.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
110
+ "language_model.model.layers.19.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
111
+ "language_model.model.layers.19.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
112
+ "language_model.model.layers.19.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
113
+ "language_model.model.layers.19.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
114
+ "language_model.model.layers.19.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
115
+ "language_model.model.layers.19.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
116
+ "language_model.model.layers.2.input_layernorm.weight": "model-00001-of-00002.safetensors",
117
+ "language_model.model.layers.2.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
118
+ "language_model.model.layers.2.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
119
+ "language_model.model.layers.2.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
120
+ "language_model.model.layers.2.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
121
+ "language_model.model.layers.2.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
122
+ "language_model.model.layers.2.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
123
+ "language_model.model.layers.2.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
124
+ "language_model.model.layers.2.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
125
+ "language_model.model.layers.20.input_layernorm.weight": "model-00002-of-00002.safetensors",
126
+ "language_model.model.layers.20.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
127
+ "language_model.model.layers.20.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
128
+ "language_model.model.layers.20.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
129
+ "language_model.model.layers.20.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
130
+ "language_model.model.layers.20.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
131
+ "language_model.model.layers.20.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
132
+ "language_model.model.layers.20.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
133
+ "language_model.model.layers.20.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
134
+ "language_model.model.layers.21.input_layernorm.weight": "model-00002-of-00002.safetensors",
135
+ "language_model.model.layers.21.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
136
+ "language_model.model.layers.21.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
137
+ "language_model.model.layers.21.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
138
+ "language_model.model.layers.21.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
139
+ "language_model.model.layers.21.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
140
+ "language_model.model.layers.21.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
141
+ "language_model.model.layers.21.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
142
+ "language_model.model.layers.21.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
143
+ "language_model.model.layers.22.input_layernorm.weight": "model-00002-of-00002.safetensors",
144
+ "language_model.model.layers.22.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
145
+ "language_model.model.layers.22.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
146
+ "language_model.model.layers.22.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
147
+ "language_model.model.layers.22.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
148
+ "language_model.model.layers.22.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
149
+ "language_model.model.layers.22.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
150
+ "language_model.model.layers.22.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
151
+ "language_model.model.layers.22.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
152
+ "language_model.model.layers.23.input_layernorm.weight": "model-00002-of-00002.safetensors",
153
+ "language_model.model.layers.23.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
154
+ "language_model.model.layers.23.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
155
+ "language_model.model.layers.23.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
156
+ "language_model.model.layers.23.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
157
+ "language_model.model.layers.23.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
158
+ "language_model.model.layers.23.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
159
+ "language_model.model.layers.23.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
160
+ "language_model.model.layers.23.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
161
+ "language_model.model.layers.24.input_layernorm.weight": "model-00002-of-00002.safetensors",
162
+ "language_model.model.layers.24.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
163
+ "language_model.model.layers.24.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
164
+ "language_model.model.layers.24.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
165
+ "language_model.model.layers.24.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
166
+ "language_model.model.layers.24.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
167
+ "language_model.model.layers.24.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
168
+ "language_model.model.layers.24.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
169
+ "language_model.model.layers.24.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
170
+ "language_model.model.layers.25.input_layernorm.weight": "model-00002-of-00002.safetensors",
171
+ "language_model.model.layers.25.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
172
+ "language_model.model.layers.25.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
173
+ "language_model.model.layers.25.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
174
+ "language_model.model.layers.25.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
175
+ "language_model.model.layers.25.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
176
+ "language_model.model.layers.25.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
177
+ "language_model.model.layers.25.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
178
+ "language_model.model.layers.25.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
179
+ "language_model.model.layers.3.input_layernorm.weight": "model-00001-of-00002.safetensors",
180
+ "language_model.model.layers.3.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
181
+ "language_model.model.layers.3.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
182
+ "language_model.model.layers.3.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
183
+ "language_model.model.layers.3.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
184
+ "language_model.model.layers.3.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
185
+ "language_model.model.layers.3.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
186
+ "language_model.model.layers.3.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
187
+ "language_model.model.layers.3.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
188
+ "language_model.model.layers.4.input_layernorm.weight": "model-00001-of-00002.safetensors",
189
+ "language_model.model.layers.4.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
190
+ "language_model.model.layers.4.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
191
+ "language_model.model.layers.4.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
192
+ "language_model.model.layers.4.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
193
+ "language_model.model.layers.4.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
194
+ "language_model.model.layers.4.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
195
+ "language_model.model.layers.4.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
196
+ "language_model.model.layers.4.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
197
+ "language_model.model.layers.5.input_layernorm.weight": "model-00001-of-00002.safetensors",
198
+ "language_model.model.layers.5.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
199
+ "language_model.model.layers.5.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
200
+ "language_model.model.layers.5.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
201
+ "language_model.model.layers.5.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
202
+ "language_model.model.layers.5.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
203
+ "language_model.model.layers.5.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
204
+ "language_model.model.layers.5.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
205
+ "language_model.model.layers.5.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
206
+ "language_model.model.layers.6.input_layernorm.weight": "model-00001-of-00002.safetensors",
207
+ "language_model.model.layers.6.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
208
+ "language_model.model.layers.6.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
209
+ "language_model.model.layers.6.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
210
+ "language_model.model.layers.6.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
211
+ "language_model.model.layers.6.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
212
+ "language_model.model.layers.6.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
213
+ "language_model.model.layers.6.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
214
+ "language_model.model.layers.6.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
215
+ "language_model.model.layers.7.input_layernorm.weight": "model-00001-of-00002.safetensors",
216
+ "language_model.model.layers.7.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
217
+ "language_model.model.layers.7.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
218
+ "language_model.model.layers.7.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
219
+ "language_model.model.layers.7.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
220
+ "language_model.model.layers.7.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
221
+ "language_model.model.layers.7.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
222
+ "language_model.model.layers.7.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
223
+ "language_model.model.layers.7.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
224
+ "language_model.model.layers.8.input_layernorm.weight": "model-00001-of-00002.safetensors",
225
+ "language_model.model.layers.8.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
226
+ "language_model.model.layers.8.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
227
+ "language_model.model.layers.8.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
228
+ "language_model.model.layers.8.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
229
+ "language_model.model.layers.8.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
230
+ "language_model.model.layers.8.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
231
+ "language_model.model.layers.8.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
232
+ "language_model.model.layers.8.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
233
+ "language_model.model.layers.9.input_layernorm.weight": "model-00001-of-00002.safetensors",
234
+ "language_model.model.layers.9.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
235
+ "language_model.model.layers.9.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
236
+ "language_model.model.layers.9.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
237
+ "language_model.model.layers.9.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
238
+ "language_model.model.layers.9.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
239
+ "language_model.model.layers.9.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
240
+ "language_model.model.layers.9.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
241
+ "language_model.model.layers.9.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
242
+ "language_model.model.norm.weight": "model-00002-of-00002.safetensors",
243
+ "multi_modal_projector.linear_1.weight": "model-00001-of-00002.safetensors",
244
+ "multi_modal_projector.linear_2.weight": "model-00001-of-00002.safetensors",
245
+ "multi_modal_projector.norm.weight": "model-00001-of-00002.safetensors",
246
+ "multi_modal_projector.patch_merger.merging_layer.weight": "model-00001-of-00002.safetensors",
247
+ "vision_tower.ln_pre.weight": "model-00001-of-00002.safetensors",
248
+ "vision_tower.patch_conv.weight": "model-00001-of-00002.safetensors",
249
+ "vision_tower.transformer.layers.0.attention.k_proj.weight": "model-00001-of-00002.safetensors",
250
+ "vision_tower.transformer.layers.0.attention.o_proj.weight": "model-00001-of-00002.safetensors",
251
+ "vision_tower.transformer.layers.0.attention.q_proj.weight": "model-00001-of-00002.safetensors",
252
+ "vision_tower.transformer.layers.0.attention.v_proj.weight": "model-00001-of-00002.safetensors",
253
+ "vision_tower.transformer.layers.0.attention_norm.weight": "model-00001-of-00002.safetensors",
254
+ "vision_tower.transformer.layers.0.feed_forward.down_proj.weight": "model-00001-of-00002.safetensors",
255
+ "vision_tower.transformer.layers.0.feed_forward.gate_proj.weight": "model-00001-of-00002.safetensors",
256
+ "vision_tower.transformer.layers.0.feed_forward.up_proj.weight": "model-00001-of-00002.safetensors",
257
+ "vision_tower.transformer.layers.0.ffn_norm.weight": "model-00001-of-00002.safetensors",
258
+ "vision_tower.transformer.layers.1.attention.k_proj.weight": "model-00001-of-00002.safetensors",
259
+ "vision_tower.transformer.layers.1.attention.o_proj.weight": "model-00001-of-00002.safetensors",
260
+ "vision_tower.transformer.layers.1.attention.q_proj.weight": "model-00001-of-00002.safetensors",
261
+ "vision_tower.transformer.layers.1.attention.v_proj.weight": "model-00001-of-00002.safetensors",
262
+ "vision_tower.transformer.layers.1.attention_norm.weight": "model-00001-of-00002.safetensors",
263
+ "vision_tower.transformer.layers.1.feed_forward.down_proj.weight": "model-00001-of-00002.safetensors",
264
+ "vision_tower.transformer.layers.1.feed_forward.gate_proj.weight": "model-00001-of-00002.safetensors",
265
+ "vision_tower.transformer.layers.1.feed_forward.up_proj.weight": "model-00001-of-00002.safetensors",
266
+ "vision_tower.transformer.layers.1.ffn_norm.weight": "model-00001-of-00002.safetensors",
267
+ "vision_tower.transformer.layers.10.attention.k_proj.weight": "model-00001-of-00002.safetensors",
268
+ "vision_tower.transformer.layers.10.attention.o_proj.weight": "model-00001-of-00002.safetensors",
269
+ "vision_tower.transformer.layers.10.attention.q_proj.weight": "model-00001-of-00002.safetensors",
270
+ "vision_tower.transformer.layers.10.attention.v_proj.weight": "model-00001-of-00002.safetensors",
271
+ "vision_tower.transformer.layers.10.attention_norm.weight": "model-00001-of-00002.safetensors",
272
+ "vision_tower.transformer.layers.10.feed_forward.down_proj.weight": "model-00001-of-00002.safetensors",
273
+ "vision_tower.transformer.layers.10.feed_forward.gate_proj.weight": "model-00001-of-00002.safetensors",
274
+ "vision_tower.transformer.layers.10.feed_forward.up_proj.weight": "model-00001-of-00002.safetensors",
275
+ "vision_tower.transformer.layers.10.ffn_norm.weight": "model-00001-of-00002.safetensors",
276
+ "vision_tower.transformer.layers.11.attention.k_proj.weight": "model-00001-of-00002.safetensors",
277
+ "vision_tower.transformer.layers.11.attention.o_proj.weight": "model-00001-of-00002.safetensors",
278
+ "vision_tower.transformer.layers.11.attention.q_proj.weight": "model-00001-of-00002.safetensors",
279
+ "vision_tower.transformer.layers.11.attention.v_proj.weight": "model-00001-of-00002.safetensors",
280
+ "vision_tower.transformer.layers.11.attention_norm.weight": "model-00001-of-00002.safetensors",
281
+ "vision_tower.transformer.layers.11.feed_forward.down_proj.weight": "model-00001-of-00002.safetensors",
282
+ "vision_tower.transformer.layers.11.feed_forward.gate_proj.weight": "model-00001-of-00002.safetensors",
283
+ "vision_tower.transformer.layers.11.feed_forward.up_proj.weight": "model-00001-of-00002.safetensors",
284
+ "vision_tower.transformer.layers.11.ffn_norm.weight": "model-00001-of-00002.safetensors",
285
+ "vision_tower.transformer.layers.12.attention.k_proj.weight": "model-00001-of-00002.safetensors",
286
+ "vision_tower.transformer.layers.12.attention.o_proj.weight": "model-00001-of-00002.safetensors",
287
+ "vision_tower.transformer.layers.12.attention.q_proj.weight": "model-00001-of-00002.safetensors",
288
+ "vision_tower.transformer.layers.12.attention.v_proj.weight": "model-00001-of-00002.safetensors",
289
+ "vision_tower.transformer.layers.12.attention_norm.weight": "model-00001-of-00002.safetensors",
290
+ "vision_tower.transformer.layers.12.feed_forward.down_proj.weight": "model-00001-of-00002.safetensors",
291
+ "vision_tower.transformer.layers.12.feed_forward.gate_proj.weight": "model-00001-of-00002.safetensors",
292
+ "vision_tower.transformer.layers.12.feed_forward.up_proj.weight": "model-00001-of-00002.safetensors",
293
+ "vision_tower.transformer.layers.12.ffn_norm.weight": "model-00001-of-00002.safetensors",
294
+ "vision_tower.transformer.layers.13.attention.k_proj.weight": "model-00001-of-00002.safetensors",
295
+ "vision_tower.transformer.layers.13.attention.o_proj.weight": "model-00001-of-00002.safetensors",
296
+ "vision_tower.transformer.layers.13.attention.q_proj.weight": "model-00001-of-00002.safetensors",
297
+ "vision_tower.transformer.layers.13.attention.v_proj.weight": "model-00001-of-00002.safetensors",
298
+ "vision_tower.transformer.layers.13.attention_norm.weight": "model-00001-of-00002.safetensors",
299
+ "vision_tower.transformer.layers.13.feed_forward.down_proj.weight": "model-00001-of-00002.safetensors",
300
+ "vision_tower.transformer.layers.13.feed_forward.gate_proj.weight": "model-00001-of-00002.safetensors",
301
+ "vision_tower.transformer.layers.13.feed_forward.up_proj.weight": "model-00001-of-00002.safetensors",
302
+ "vision_tower.transformer.layers.13.ffn_norm.weight": "model-00001-of-00002.safetensors",
303
+ "vision_tower.transformer.layers.14.attention.k_proj.weight": "model-00001-of-00002.safetensors",
304
+ "vision_tower.transformer.layers.14.attention.o_proj.weight": "model-00001-of-00002.safetensors",
305
+ "vision_tower.transformer.layers.14.attention.q_proj.weight": "model-00001-of-00002.safetensors",
306
+ "vision_tower.transformer.layers.14.attention.v_proj.weight": "model-00001-of-00002.safetensors",
307
+ "vision_tower.transformer.layers.14.attention_norm.weight": "model-00001-of-00002.safetensors",
308
+ "vision_tower.transformer.layers.14.feed_forward.down_proj.weight": "model-00001-of-00002.safetensors",
309
+ "vision_tower.transformer.layers.14.feed_forward.gate_proj.weight": "model-00001-of-00002.safetensors",
310
+ "vision_tower.transformer.layers.14.feed_forward.up_proj.weight": "model-00001-of-00002.safetensors",
311
+ "vision_tower.transformer.layers.14.ffn_norm.weight": "model-00001-of-00002.safetensors",
312
+ "vision_tower.transformer.layers.15.attention.k_proj.weight": "model-00001-of-00002.safetensors",
313
+ "vision_tower.transformer.layers.15.attention.o_proj.weight": "model-00001-of-00002.safetensors",
314
+ "vision_tower.transformer.layers.15.attention.q_proj.weight": "model-00001-of-00002.safetensors",
315
+ "vision_tower.transformer.layers.15.attention.v_proj.weight": "model-00001-of-00002.safetensors",
316
+ "vision_tower.transformer.layers.15.attention_norm.weight": "model-00001-of-00002.safetensors",
317
+ "vision_tower.transformer.layers.15.feed_forward.down_proj.weight": "model-00001-of-00002.safetensors",
318
+ "vision_tower.transformer.layers.15.feed_forward.gate_proj.weight": "model-00001-of-00002.safetensors",
319
+ "vision_tower.transformer.layers.15.feed_forward.up_proj.weight": "model-00001-of-00002.safetensors",
320
+ "vision_tower.transformer.layers.15.ffn_norm.weight": "model-00001-of-00002.safetensors",
321
+ "vision_tower.transformer.layers.16.attention.k_proj.weight": "model-00001-of-00002.safetensors",
322
+ "vision_tower.transformer.layers.16.attention.o_proj.weight": "model-00001-of-00002.safetensors",
323
+ "vision_tower.transformer.layers.16.attention.q_proj.weight": "model-00001-of-00002.safetensors",
324
+ "vision_tower.transformer.layers.16.attention.v_proj.weight": "model-00001-of-00002.safetensors",
325
+ "vision_tower.transformer.layers.16.attention_norm.weight": "model-00001-of-00002.safetensors",
326
+ "vision_tower.transformer.layers.16.feed_forward.down_proj.weight": "model-00001-of-00002.safetensors",
327
+ "vision_tower.transformer.layers.16.feed_forward.gate_proj.weight": "model-00001-of-00002.safetensors",
328
+ "vision_tower.transformer.layers.16.feed_forward.up_proj.weight": "model-00001-of-00002.safetensors",
329
+ "vision_tower.transformer.layers.16.ffn_norm.weight": "model-00001-of-00002.safetensors",
330
+ "vision_tower.transformer.layers.17.attention.k_proj.weight": "model-00001-of-00002.safetensors",
331
+ "vision_tower.transformer.layers.17.attention.o_proj.weight": "model-00001-of-00002.safetensors",
332
+ "vision_tower.transformer.layers.17.attention.q_proj.weight": "model-00001-of-00002.safetensors",
333
+ "vision_tower.transformer.layers.17.attention.v_proj.weight": "model-00001-of-00002.safetensors",
334
+ "vision_tower.transformer.layers.17.attention_norm.weight": "model-00001-of-00002.safetensors",
335
+ "vision_tower.transformer.layers.17.feed_forward.down_proj.weight": "model-00001-of-00002.safetensors",
336
+ "vision_tower.transformer.layers.17.feed_forward.gate_proj.weight": "model-00001-of-00002.safetensors",
337
+ "vision_tower.transformer.layers.17.feed_forward.up_proj.weight": "model-00001-of-00002.safetensors",
338
+ "vision_tower.transformer.layers.17.ffn_norm.weight": "model-00001-of-00002.safetensors",
339
+ "vision_tower.transformer.layers.18.attention.k_proj.weight": "model-00001-of-00002.safetensors",
340
+ "vision_tower.transformer.layers.18.attention.o_proj.weight": "model-00001-of-00002.safetensors",
341
+ "vision_tower.transformer.layers.18.attention.q_proj.weight": "model-00001-of-00002.safetensors",
342
+ "vision_tower.transformer.layers.18.attention.v_proj.weight": "model-00001-of-00002.safetensors",
343
+ "vision_tower.transformer.layers.18.attention_norm.weight": "model-00001-of-00002.safetensors",
344
+ "vision_tower.transformer.layers.18.feed_forward.down_proj.weight": "model-00001-of-00002.safetensors",
345
+ "vision_tower.transformer.layers.18.feed_forward.gate_proj.weight": "model-00001-of-00002.safetensors",
346
+ "vision_tower.transformer.layers.18.feed_forward.up_proj.weight": "model-00001-of-00002.safetensors",
347
+ "vision_tower.transformer.layers.18.ffn_norm.weight": "model-00001-of-00002.safetensors",
348
+ "vision_tower.transformer.layers.19.attention.k_proj.weight": "model-00001-of-00002.safetensors",
349
+ "vision_tower.transformer.layers.19.attention.o_proj.weight": "model-00001-of-00002.safetensors",
350
+ "vision_tower.transformer.layers.19.attention.q_proj.weight": "model-00001-of-00002.safetensors",
351
+ "vision_tower.transformer.layers.19.attention.v_proj.weight": "model-00001-of-00002.safetensors",
352
+ "vision_tower.transformer.layers.19.attention_norm.weight": "model-00001-of-00002.safetensors",
353
+ "vision_tower.transformer.layers.19.feed_forward.down_proj.weight": "model-00001-of-00002.safetensors",
354
+ "vision_tower.transformer.layers.19.feed_forward.gate_proj.weight": "model-00001-of-00002.safetensors",
355
+ "vision_tower.transformer.layers.19.feed_forward.up_proj.weight": "model-00001-of-00002.safetensors",
356
+ "vision_tower.transformer.layers.19.ffn_norm.weight": "model-00001-of-00002.safetensors",
357
+ "vision_tower.transformer.layers.2.attention.k_proj.weight": "model-00001-of-00002.safetensors",
358
+ "vision_tower.transformer.layers.2.attention.o_proj.weight": "model-00001-of-00002.safetensors",
359
+ "vision_tower.transformer.layers.2.attention.q_proj.weight": "model-00001-of-00002.safetensors",
360
+ "vision_tower.transformer.layers.2.attention.v_proj.weight": "model-00001-of-00002.safetensors",
361
+ "vision_tower.transformer.layers.2.attention_norm.weight": "model-00001-of-00002.safetensors",
362
+ "vision_tower.transformer.layers.2.feed_forward.down_proj.weight": "model-00001-of-00002.safetensors",
363
+ "vision_tower.transformer.layers.2.feed_forward.gate_proj.weight": "model-00001-of-00002.safetensors",
364
+ "vision_tower.transformer.layers.2.feed_forward.up_proj.weight": "model-00001-of-00002.safetensors",
365
+ "vision_tower.transformer.layers.2.ffn_norm.weight": "model-00001-of-00002.safetensors",
366
+ "vision_tower.transformer.layers.20.attention.k_proj.weight": "model-00001-of-00002.safetensors",
367
+ "vision_tower.transformer.layers.20.attention.o_proj.weight": "model-00001-of-00002.safetensors",
368
+ "vision_tower.transformer.layers.20.attention.q_proj.weight": "model-00001-of-00002.safetensors",
369
+ "vision_tower.transformer.layers.20.attention.v_proj.weight": "model-00001-of-00002.safetensors",
370
+ "vision_tower.transformer.layers.20.attention_norm.weight": "model-00001-of-00002.safetensors",
371
+ "vision_tower.transformer.layers.20.feed_forward.down_proj.weight": "model-00001-of-00002.safetensors",
372
+ "vision_tower.transformer.layers.20.feed_forward.gate_proj.weight": "model-00001-of-00002.safetensors",
373
+ "vision_tower.transformer.layers.20.feed_forward.up_proj.weight": "model-00001-of-00002.safetensors",
374
+ "vision_tower.transformer.layers.20.ffn_norm.weight": "model-00001-of-00002.safetensors",
375
+ "vision_tower.transformer.layers.21.attention.k_proj.weight": "model-00001-of-00002.safetensors",
376
+ "vision_tower.transformer.layers.21.attention.o_proj.weight": "model-00001-of-00002.safetensors",
377
+ "vision_tower.transformer.layers.21.attention.q_proj.weight": "model-00001-of-00002.safetensors",
378
+ "vision_tower.transformer.layers.21.attention.v_proj.weight": "model-00001-of-00002.safetensors",
379
+ "vision_tower.transformer.layers.21.attention_norm.weight": "model-00001-of-00002.safetensors",
380
+ "vision_tower.transformer.layers.21.feed_forward.down_proj.weight": "model-00001-of-00002.safetensors",
381
+ "vision_tower.transformer.layers.21.feed_forward.gate_proj.weight": "model-00001-of-00002.safetensors",
382
+ "vision_tower.transformer.layers.21.feed_forward.up_proj.weight": "model-00001-of-00002.safetensors",
383
+ "vision_tower.transformer.layers.21.ffn_norm.weight": "model-00001-of-00002.safetensors",
384
+ "vision_tower.transformer.layers.22.attention.k_proj.weight": "model-00001-of-00002.safetensors",
385
+ "vision_tower.transformer.layers.22.attention.o_proj.weight": "model-00001-of-00002.safetensors",
386
+ "vision_tower.transformer.layers.22.attention.q_proj.weight": "model-00001-of-00002.safetensors",
387
+ "vision_tower.transformer.layers.22.attention.v_proj.weight": "model-00001-of-00002.safetensors",
388
+ "vision_tower.transformer.layers.22.attention_norm.weight": "model-00001-of-00002.safetensors",
389
+ "vision_tower.transformer.layers.22.feed_forward.down_proj.weight": "model-00001-of-00002.safetensors",
390
+ "vision_tower.transformer.layers.22.feed_forward.gate_proj.weight": "model-00001-of-00002.safetensors",
391
+ "vision_tower.transformer.layers.22.feed_forward.up_proj.weight": "model-00001-of-00002.safetensors",
392
+ "vision_tower.transformer.layers.22.ffn_norm.weight": "model-00001-of-00002.safetensors",
393
+ "vision_tower.transformer.layers.23.attention.k_proj.weight": "model-00001-of-00002.safetensors",
394
+ "vision_tower.transformer.layers.23.attention.o_proj.weight": "model-00001-of-00002.safetensors",
395
+ "vision_tower.transformer.layers.23.attention.q_proj.weight": "model-00001-of-00002.safetensors",
396
+ "vision_tower.transformer.layers.23.attention.v_proj.weight": "model-00001-of-00002.safetensors",
397
+ "vision_tower.transformer.layers.23.attention_norm.weight": "model-00001-of-00002.safetensors",
398
+ "vision_tower.transformer.layers.23.feed_forward.down_proj.weight": "model-00001-of-00002.safetensors",
399
+ "vision_tower.transformer.layers.23.feed_forward.gate_proj.weight": "model-00001-of-00002.safetensors",
400
+ "vision_tower.transformer.layers.23.feed_forward.up_proj.weight": "model-00001-of-00002.safetensors",
401
+ "vision_tower.transformer.layers.23.ffn_norm.weight": "model-00001-of-00002.safetensors",
402
+ "vision_tower.transformer.layers.3.attention.k_proj.weight": "model-00001-of-00002.safetensors",
403
+ "vision_tower.transformer.layers.3.attention.o_proj.weight": "model-00001-of-00002.safetensors",
404
+ "vision_tower.transformer.layers.3.attention.q_proj.weight": "model-00001-of-00002.safetensors",
405
+ "vision_tower.transformer.layers.3.attention.v_proj.weight": "model-00001-of-00002.safetensors",
406
+ "vision_tower.transformer.layers.3.attention_norm.weight": "model-00001-of-00002.safetensors",
407
+ "vision_tower.transformer.layers.3.feed_forward.down_proj.weight": "model-00001-of-00002.safetensors",
408
+ "vision_tower.transformer.layers.3.feed_forward.gate_proj.weight": "model-00001-of-00002.safetensors",
409
+ "vision_tower.transformer.layers.3.feed_forward.up_proj.weight": "model-00001-of-00002.safetensors",
410
+ "vision_tower.transformer.layers.3.ffn_norm.weight": "model-00001-of-00002.safetensors",
411
+ "vision_tower.transformer.layers.4.attention.k_proj.weight": "model-00001-of-00002.safetensors",
412
+ "vision_tower.transformer.layers.4.attention.o_proj.weight": "model-00001-of-00002.safetensors",
413
+ "vision_tower.transformer.layers.4.attention.q_proj.weight": "model-00001-of-00002.safetensors",
414
+ "vision_tower.transformer.layers.4.attention.v_proj.weight": "model-00001-of-00002.safetensors",
415
+ "vision_tower.transformer.layers.4.attention_norm.weight": "model-00001-of-00002.safetensors",
416
+ "vision_tower.transformer.layers.4.feed_forward.down_proj.weight": "model-00001-of-00002.safetensors",
417
+ "vision_tower.transformer.layers.4.feed_forward.gate_proj.weight": "model-00001-of-00002.safetensors",
418
+ "vision_tower.transformer.layers.4.feed_forward.up_proj.weight": "model-00001-of-00002.safetensors",
419
+ "vision_tower.transformer.layers.4.ffn_norm.weight": "model-00001-of-00002.safetensors",
420
+ "vision_tower.transformer.layers.5.attention.k_proj.weight": "model-00001-of-00002.safetensors",
421
+ "vision_tower.transformer.layers.5.attention.o_proj.weight": "model-00001-of-00002.safetensors",
422
+ "vision_tower.transformer.layers.5.attention.q_proj.weight": "model-00001-of-00002.safetensors",
423
+ "vision_tower.transformer.layers.5.attention.v_proj.weight": "model-00001-of-00002.safetensors",
424
+ "vision_tower.transformer.layers.5.attention_norm.weight": "model-00001-of-00002.safetensors",
425
+ "vision_tower.transformer.layers.5.feed_forward.down_proj.weight": "model-00001-of-00002.safetensors",
426
+ "vision_tower.transformer.layers.5.feed_forward.gate_proj.weight": "model-00001-of-00002.safetensors",
427
+ "vision_tower.transformer.layers.5.feed_forward.up_proj.weight": "model-00001-of-00002.safetensors",
428
+ "vision_tower.transformer.layers.5.ffn_norm.weight": "model-00001-of-00002.safetensors",
429
+ "vision_tower.transformer.layers.6.attention.k_proj.weight": "model-00001-of-00002.safetensors",
430
+ "vision_tower.transformer.layers.6.attention.o_proj.weight": "model-00001-of-00002.safetensors",
431
+ "vision_tower.transformer.layers.6.attention.q_proj.weight": "model-00001-of-00002.safetensors",
432
+ "vision_tower.transformer.layers.6.attention.v_proj.weight": "model-00001-of-00002.safetensors",
433
+ "vision_tower.transformer.layers.6.attention_norm.weight": "model-00001-of-00002.safetensors",
434
+ "vision_tower.transformer.layers.6.feed_forward.down_proj.weight": "model-00001-of-00002.safetensors",
435
+ "vision_tower.transformer.layers.6.feed_forward.gate_proj.weight": "model-00001-of-00002.safetensors",
436
+ "vision_tower.transformer.layers.6.feed_forward.up_proj.weight": "model-00001-of-00002.safetensors",
437
+ "vision_tower.transformer.layers.6.ffn_norm.weight": "model-00001-of-00002.safetensors",
438
+ "vision_tower.transformer.layers.7.attention.k_proj.weight": "model-00001-of-00002.safetensors",
439
+ "vision_tower.transformer.layers.7.attention.o_proj.weight": "model-00001-of-00002.safetensors",
440
+ "vision_tower.transformer.layers.7.attention.q_proj.weight": "model-00001-of-00002.safetensors",
441
+ "vision_tower.transformer.layers.7.attention.v_proj.weight": "model-00001-of-00002.safetensors",
442
+ "vision_tower.transformer.layers.7.attention_norm.weight": "model-00001-of-00002.safetensors",
443
+ "vision_tower.transformer.layers.7.feed_forward.down_proj.weight": "model-00001-of-00002.safetensors",
444
+ "vision_tower.transformer.layers.7.feed_forward.gate_proj.weight": "model-00001-of-00002.safetensors",
445
+ "vision_tower.transformer.layers.7.feed_forward.up_proj.weight": "model-00001-of-00002.safetensors",
446
+ "vision_tower.transformer.layers.7.ffn_norm.weight": "model-00001-of-00002.safetensors",
447
+ "vision_tower.transformer.layers.8.attention.k_proj.weight": "model-00001-of-00002.safetensors",
448
+ "vision_tower.transformer.layers.8.attention.o_proj.weight": "model-00001-of-00002.safetensors",
449
+ "vision_tower.transformer.layers.8.attention.q_proj.weight": "model-00001-of-00002.safetensors",
450
+ "vision_tower.transformer.layers.8.attention.v_proj.weight": "model-00001-of-00002.safetensors",
451
+ "vision_tower.transformer.layers.8.attention_norm.weight": "model-00001-of-00002.safetensors",
452
+ "vision_tower.transformer.layers.8.feed_forward.down_proj.weight": "model-00001-of-00002.safetensors",
453
+ "vision_tower.transformer.layers.8.feed_forward.gate_proj.weight": "model-00001-of-00002.safetensors",
454
+ "vision_tower.transformer.layers.8.feed_forward.up_proj.weight": "model-00001-of-00002.safetensors",
455
+ "vision_tower.transformer.layers.8.ffn_norm.weight": "model-00001-of-00002.safetensors",
456
+ "vision_tower.transformer.layers.9.attention.k_proj.weight": "model-00001-of-00002.safetensors",
457
+ "vision_tower.transformer.layers.9.attention.o_proj.weight": "model-00001-of-00002.safetensors",
458
+ "vision_tower.transformer.layers.9.attention.q_proj.weight": "model-00001-of-00002.safetensors",
459
+ "vision_tower.transformer.layers.9.attention.v_proj.weight": "model-00001-of-00002.safetensors",
460
+ "vision_tower.transformer.layers.9.attention_norm.weight": "model-00001-of-00002.safetensors",
461
+ "vision_tower.transformer.layers.9.feed_forward.down_proj.weight": "model-00001-of-00002.safetensors",
462
+ "vision_tower.transformer.layers.9.feed_forward.gate_proj.weight": "model-00001-of-00002.safetensors",
463
+ "vision_tower.transformer.layers.9.feed_forward.up_proj.weight": "model-00001-of-00002.safetensors",
464
+ "vision_tower.transformer.layers.9.ffn_norm.weight": "model-00001-of-00002.safetensors"
465
+ }
466
+ }
params.json CHANGED
@@ -19,10 +19,6 @@
19
  "qk_nope_head_dim": null,
20
  "kv_lora_rank": null,
21
  "v_head_dim": null,
22
- "quantization": {
23
- "qformat_weight": "fp8_e4m3",
24
- "qscheme_act": "TENSOR"
25
- },
26
  "yarn": {
27
  "original_max_position_embeddings": 16384,
28
  "factor": 16,
 
19
  "qk_nope_head_dim": null,
20
  "kv_lora_rank": null,
21
  "v_head_dim": null,
 
 
 
 
22
  "yarn": {
23
  "original_max_position_embeddings": 16384,
24
  "factor": 16,
tokenizer.json CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:286acad9b0e27fce778ac429763536accf618ccb6ed72963b6f94685e531c5c7
3
- size 17077402
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:577575622324b2e099e2648be26bdeb5e5815ffe66d7004e9e3ddbf421db6bf1
3
+ size 17078110
tokenizer_config.json CHANGED
The diff for this file is too large to render. See raw diff