CycleCore-Technologies commited on
Commit
168bc68
·
verified ·
1 Parent(s): 2a755c3

Upload Maaza-MLM-135M-JSON-v1 - v1.0.0 production release

Browse files
README.md ADDED
@@ -0,0 +1,302 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # CycleCore Maaza MLM-135M-JSON v1.0.0
2
+
3
+ Micro Language Model (135M parameters) specialized for JSON extraction on edge devices.
4
+
5
+ ## Model Details
6
+
7
+ - **Developer**: CycleCore Technologies
8
+ - **Model Name**: CycleCore Maaza MLM-135M-JSON
9
+ - **Version**: v1.0.0
10
+ - **Base Model**: SmolLM2-135M (HuggingFaceTB)
11
+ - **Training Method**: LoRA fine-tuning (r=16, alpha=32)
12
+ - **Task**: Structured JSON extraction
13
+ - **License**: Apache 2.0
14
+ - **Parameters**: 135M total, 4.88M trainable (3.5%)
15
+ - **Model Size**: ~270MB (FP16), ~70MB (Q4 quantized)
16
+ - **Context Length**: 2048 tokens
17
+
18
+ ## Intended Use
19
+
20
+ ### Primary Use Cases
21
+ - IoT sensor data extraction and structuring
22
+ - API response parsing and validation
23
+ - Form field extraction from documents
24
+ - Database record structuring from natural language
25
+ - Log file parsing and structuring
26
+
27
+ ### Target Hardware
28
+ - **Edge Devices**: Raspberry Pi 5, embedded systems
29
+ - **Laptop CPU**: x86/ARM, 16GB RAM, CPU-only
30
+ - **Browser**: WebGPU (via ONNX Runtime)
31
+ - **Server**: Optional GPU acceleration
32
+
33
+ ### Out of Scope
34
+ - Open-ended conversation or creative writing
35
+ - Complex reasoning or multi-hop logic
36
+ - Math problem solving
37
+ - General-purpose chat applications
38
+
39
+ ## Benchmark Performance
40
+
41
+ ### EdgeJSON v3 Benchmark
42
+
43
+ Evaluated on 158 test cases across 24 schema types:
44
+
45
+ | Metric | Score |
46
+ |--------|-------|
47
+ | **JSONExact** | 24.7% |
48
+ | **Field F1** | 0.520 |
49
+ | **Schema Compliance** | 41.1% |
50
+ | **Latency (CPU)** | 18.5 tokens/sec |
51
+ | **Training Time** | 48.7 seconds |
52
+
53
+ ### By Complexity Level
54
+
55
+ | Complexity | Fields | Nesting | JSONExact | Field F1 |
56
+ |------------|--------|---------|-----------|----------|
57
+ | Simple | 2-4 | Flat | 44.7% | 0.698 |
58
+ | Medium | 4-8 | 1-2 levels | 13.5% | 0.456 |
59
+ | Complex | 8+ | 2+ levels | 0.0% | 0.234 |
60
+
61
+ ### Perfect Schemas (100% JSONExact)
62
+
63
+ - `product_info` (2 fields, simple)
64
+ - `sensor_reading` (4 fields, simple)
65
+
66
+ ### Training Improvement
67
+
68
+ - **Base SmolLM2-135M**: 1.9% JSONExact
69
+ - **Fine-tuned (this model)**: 24.7% JSONExact
70
+ - **Training Multiplier**: 13.0× improvement
71
+
72
+ ## Training Data
73
+
74
+ ### Dataset: EdgeJSON v3
75
+ - **Total Examples**: 787 (100% validated)
76
+ - **Train Split**: 629 examples (80%)
77
+ - **Test Split**: 158 examples (20%)
78
+ - **Validation Rate**: 100% (all examples pass schema validation)
79
+ - **Schema Count**: 24 unique schemas
80
+ - **Complexity Distribution**: 38 simple, 74 medium, 46 complex
81
+
82
+ ### Data Generation
83
+ - **Teacher Model**: Qwen2.5-7B-Instruct
84
+ - **Method**: Synthetic generation with validation
85
+ - **Quality Control**: 100% schema compliance, manual review sampling
86
+
87
+ ### Prompt Template
88
+ ```
89
+ Extract the structured JSON data from the following text.
90
+
91
+ Input: {prompt}
92
+
93
+ Output:
94
+ ```
95
+
96
+ ## Training Procedure
97
+
98
+ ### Hardware
99
+ - **GPU**: NVIDIA RTX 4080 SUPER (16GB)
100
+ - **Training Time**: 48.7 seconds
101
+ - **Effective Batch Size**: 32 (4 per device × 8 gradient accumulation)
102
+
103
+ ### Hyperparameters
104
+ - **Method**: LoRA (Low-Rank Adaptation)
105
+ - **LoRA Rank (r)**: 16
106
+ - **LoRA Alpha**: 32
107
+ - **LoRA Dropout**: 0.1
108
+ - **Target Modules**: q_proj, v_proj, k_proj, o_proj, gate_proj, up_proj, down_proj
109
+ - **Learning Rate**: 2e-4
110
+ - **Optimizer**: AdamW (β1=0.9, β2=0.999, ε=1e-8)
111
+ - **Weight Decay**: 0.01
112
+ - **LR Scheduler**: Cosine with 10% warmup
113
+ - **Epochs**: 3
114
+ - **Precision**: BF16 mixed precision
115
+ - **Max Grad Norm**: 1.0
116
+
117
+ ### Training Loss
118
+ - **Final Training Loss**: 1.449
119
+
120
+ ## Evaluation Methodology
121
+
122
+ ### Metrics
123
+
124
+ **JSONExact Score**:
125
+ - Binary exact match (0 or 1 per example)
126
+ - Compares predicted JSON to ground truth
127
+ - Requires perfect field matching
128
+
129
+ **Field F1**:
130
+ - Per-field precision and recall
131
+ - Averaged across all fields
132
+ - Partial credit for correct fields
133
+
134
+ **Schema Compliance**:
135
+ - Validates against JSON schema specification
136
+ - Checks required fields, types, structure
137
+
138
+ ### Inference Settings
139
+ - **Temperature**: 0.0 (deterministic)
140
+ - **Max Tokens**: 512
141
+ - **Format**: JSON mode enforced
142
+ - **Platform**: CUDA (GPU) or CPU
143
+
144
+ ## Limitations and Bias
145
+
146
+ ### Known Limitations
147
+
148
+ **Capacity Ceiling**: This model hits a capacity ceiling on complex schemas (8+ fields, 2+ nesting levels), achieving 0% exact match accuracy. For complex structured extraction, consider the larger Maaza SLM-360M model.
149
+
150
+ **Simple Schema Specialization**: Best suited for simple schemas (2-4 fields, flat structure) where it achieves 44.7% accuracy.
151
+
152
+ **Synthetic Data**: Trained exclusively on synthetically generated data from Qwen2.5-7B, which may not capture all real-world edge cases.
153
+
154
+ **Domain Specificity**: Optimized for structured data extraction, not general-purpose language understanding.
155
+
156
+ ### Potential Biases
157
+ - Inherits biases from teacher model (Qwen2.5-7B)
158
+ - Synthetic data may not reflect real-world data distributions
159
+ - Performance varies significantly by schema complexity
160
+
161
+ ### Ethical Considerations
162
+ - **Privacy**: On-device deployment avoids cloud API calls, keeping data local
163
+ - **Energy**: Ultra-fast training (48.7s) and efficient inference reduce carbon footprint
164
+ - **Transparency**: 100% open training methodology, reproducible results
165
+
166
+ ## How to Use
167
+
168
+ ### Installation
169
+
170
+ ```bash
171
+ pip install transformers peft torch
172
+ ```
173
+
174
+ ### Loading the Model
175
+
176
+ ```python
177
+ from transformers import AutoTokenizer, AutoModelForCausalLM
178
+ from peft import PeftModel
179
+
180
+ # Load base model
181
+ base_model = AutoModelForCausalLM.from_pretrained(
182
+ "HuggingFaceTB/SmolLM2-135M",
183
+ torch_dtype=torch.float16,
184
+ device_map="auto"
185
+ )
186
+
187
+ # Load LoRA adapter
188
+ model = PeftModel.from_pretrained(
189
+ base_model,
190
+ "CycleCore/Maaza-MLM-135M-JSON-v1"
191
+ )
192
+
193
+ tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM2-135M")
194
+ ```
195
+
196
+ ### Inference Example
197
+
198
+ ```python
199
+ prompt = """Extract the structured JSON data from the following text.
200
+
201
+ Input: John Doe works at Acme Corp. His email is [email protected] and phone is 555-1234.
202
+
203
+ Output:"""
204
+
205
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
206
+ outputs = model.generate(
207
+ **inputs,
208
+ max_new_tokens=512,
209
+ temperature=0.0,
210
+ do_sample=False
211
+ )
212
+
213
+ result = tokenizer.decode(outputs[0], skip_special_tokens=True)
214
+ print(result)
215
+ ```
216
+
217
+ ### Expected Output
218
+
219
+ ```json
220
+ {
221
+ "name": "John Doe",
222
+ "company": "Acme Corp",
223
+ "email": "[email protected]",
224
+ "phone": "555-1234"
225
+ }
226
+ ```
227
+
228
+ ## Model Comparison
229
+
230
+ For guidance on choosing between MLM-135M and SLM-360M, see our [Model Comparison Guide](https://github.com/CycleCore/SLMBench/blob/main/docs/MODEL_COMPARISON.md).
231
+
232
+ **Quick Decision**:
233
+ - **Use MLM-135M** if: Ultra-low latency required, simple schemas (2-4 fields), <500MB deployment size
234
+ - **Use SLM-360M** if: Higher accuracy needed, medium/complex schemas, willing to use ~1GB deployment size
235
+
236
+ ## Citation
237
+
238
+ If you use this model in your research, please cite:
239
+
240
+ ```bibtex
241
+ @misc{cyclecore2025mlm,
242
+ title={CycleCore Maaza MLM-135M-JSON: Micro Language Model for Edge JSON Extraction},
243
+ author={CycleCore Technologies},
244
+ year={2025},
245
+ publisher={HuggingFace},
246
+ howpublished={\url{https://huggingface.co/CycleCore/Maaza-MLM-135M-JSON-v1}},
247
+ }
248
+ ```
249
+
250
+ **Academic Paper** (forthcoming):
251
+ ```bibtex
252
+ @article{cyclecore2025slmbench,
253
+ title={Micro Language Models (MLMs) and SLM-Bench: A Benchmark Suite for Structured Tasks on Resource-Constrained Devices},
254
+ author={CycleCore Technologies},
255
+ journal={arXiv preprint},
256
+ year={2025},
257
+ note={Paper in preparation}
258
+ }
259
+ ```
260
+
261
+ ## Links
262
+
263
+ - **Model Repository**: https://huggingface.co/CycleCore/Maaza-MLM-135M-JSON-v1
264
+ - **Base Model**: https://huggingface.co/HuggingFaceTB/SmolLM2-135M
265
+ - **SLMBench Benchmark**: https://github.com/CycleCore/SLMBench
266
+ - **Documentation**: https://github.com/CycleCore/SLMBench/tree/main/docs
267
+ - **Paper**: Coming soon (arXiv)
268
+ - **Website**: slmbench.com (coming soon)
269
+
270
+ ## Version History
271
+
272
+ ### v1.0.0 (2025-11-20)
273
+ - Initial release
274
+ - Trained on EdgeJSON v3 dataset (100% validated)
275
+ - 24.7% JSONExact, 0.520 Field F1
276
+ - LoRA fine-tuning (r=16, alpha=32)
277
+ - 48.7 second training time
278
+ - Apache 2.0 license
279
+
280
+ ## Contact
281
+
282
+ For questions, issues, or collaboration:
283
+ - **GitHub Issues**: https://github.com/CycleCore/SLMBench/issues
284
+ - **Email**: [email protected] (coming soon)
285
+
286
+ ## License
287
+
288
+ Apache License 2.0
289
+
290
+ Copyright 2025 CycleCore Technologies
291
+
292
+ Licensed under the Apache License, Version 2.0 (the "License");
293
+ you may not use this file except in compliance with the License.
294
+ You may obtain a copy of the License at
295
+
296
+ http://www.apache.org/licenses/LICENSE-2.0
297
+
298
+ Unless required by applicable law or agreed to in writing, software
299
+ distributed under the License is distributed on an "AS IS" BASIS,
300
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
301
+ See the License for the specific language governing permissions and
302
+ limitations under the License.
adapter_config.json ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alora_invocation_tokens": null,
3
+ "alpha_pattern": {},
4
+ "arrow_config": null,
5
+ "auto_mapping": null,
6
+ "base_model_name_or_path": "HuggingFaceTB/SmolLM2-135M",
7
+ "bias": "none",
8
+ "corda_config": null,
9
+ "ensure_weight_tying": false,
10
+ "eva_config": null,
11
+ "exclude_modules": null,
12
+ "fan_in_fan_out": false,
13
+ "inference_mode": true,
14
+ "init_lora_weights": true,
15
+ "layer_replication": null,
16
+ "layers_pattern": null,
17
+ "layers_to_transform": null,
18
+ "loftq_config": {},
19
+ "lora_alpha": 32,
20
+ "lora_bias": false,
21
+ "lora_dropout": 0.1,
22
+ "megatron_config": null,
23
+ "megatron_core": "megatron.core",
24
+ "modules_to_save": null,
25
+ "peft_type": "LORA",
26
+ "peft_version": "0.18.0",
27
+ "qalora_group_size": 16,
28
+ "r": 16,
29
+ "rank_pattern": {},
30
+ "revision": null,
31
+ "target_modules": [
32
+ "k_proj",
33
+ "up_proj",
34
+ "down_proj",
35
+ "o_proj",
36
+ "q_proj",
37
+ "gate_proj",
38
+ "v_proj"
39
+ ],
40
+ "target_parameters": null,
41
+ "task_type": "CAUSAL_LM",
42
+ "trainable_token_indices": null,
43
+ "use_dora": false,
44
+ "use_qalora": false,
45
+ "use_rslora": false
46
+ }
adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1e53b30c708a136ac086f3ccf6026424d9cfb183367e0933bdad77806b65b14d
3
+ size 19593064
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
special_tokens_map.json ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|endoftext|>",
4
+ "<|im_start|>",
5
+ "<|im_end|>",
6
+ "<repo_name>",
7
+ "<reponame>",
8
+ "<file_sep>",
9
+ "<filename>",
10
+ "<gh_stars>",
11
+ "<issue_start>",
12
+ "<issue_comment>",
13
+ "<issue_closed>",
14
+ "<jupyter_start>",
15
+ "<jupyter_text>",
16
+ "<jupyter_code>",
17
+ "<jupyter_output>",
18
+ "<jupyter_script>",
19
+ "<empty_output>"
20
+ ],
21
+ "bos_token": {
22
+ "content": "<|endoftext|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false
27
+ },
28
+ "eos_token": {
29
+ "content": "<|endoftext|>",
30
+ "lstrip": false,
31
+ "normalized": false,
32
+ "rstrip": false,
33
+ "single_word": false
34
+ },
35
+ "pad_token": "<|endoftext|>",
36
+ "unk_token": {
37
+ "content": "<|endoftext|>",
38
+ "lstrip": false,
39
+ "normalized": false,
40
+ "rstrip": false,
41
+ "single_word": false
42
+ }
43
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,169 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "0": {
5
+ "content": "<|endoftext|>",
6
+ "lstrip": false,
7
+ "normalized": false,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "1": {
13
+ "content": "<|im_start|>",
14
+ "lstrip": false,
15
+ "normalized": false,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "2": {
21
+ "content": "<|im_end|>",
22
+ "lstrip": false,
23
+ "normalized": false,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ },
28
+ "3": {
29
+ "content": "<repo_name>",
30
+ "lstrip": false,
31
+ "normalized": false,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": true
35
+ },
36
+ "4": {
37
+ "content": "<reponame>",
38
+ "lstrip": false,
39
+ "normalized": false,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": true
43
+ },
44
+ "5": {
45
+ "content": "<file_sep>",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false,
50
+ "special": true
51
+ },
52
+ "6": {
53
+ "content": "<filename>",
54
+ "lstrip": false,
55
+ "normalized": false,
56
+ "rstrip": false,
57
+ "single_word": false,
58
+ "special": true
59
+ },
60
+ "7": {
61
+ "content": "<gh_stars>",
62
+ "lstrip": false,
63
+ "normalized": false,
64
+ "rstrip": false,
65
+ "single_word": false,
66
+ "special": true
67
+ },
68
+ "8": {
69
+ "content": "<issue_start>",
70
+ "lstrip": false,
71
+ "normalized": false,
72
+ "rstrip": false,
73
+ "single_word": false,
74
+ "special": true
75
+ },
76
+ "9": {
77
+ "content": "<issue_comment>",
78
+ "lstrip": false,
79
+ "normalized": false,
80
+ "rstrip": false,
81
+ "single_word": false,
82
+ "special": true
83
+ },
84
+ "10": {
85
+ "content": "<issue_closed>",
86
+ "lstrip": false,
87
+ "normalized": false,
88
+ "rstrip": false,
89
+ "single_word": false,
90
+ "special": true
91
+ },
92
+ "11": {
93
+ "content": "<jupyter_start>",
94
+ "lstrip": false,
95
+ "normalized": false,
96
+ "rstrip": false,
97
+ "single_word": false,
98
+ "special": true
99
+ },
100
+ "12": {
101
+ "content": "<jupyter_text>",
102
+ "lstrip": false,
103
+ "normalized": false,
104
+ "rstrip": false,
105
+ "single_word": false,
106
+ "special": true
107
+ },
108
+ "13": {
109
+ "content": "<jupyter_code>",
110
+ "lstrip": false,
111
+ "normalized": false,
112
+ "rstrip": false,
113
+ "single_word": false,
114
+ "special": true
115
+ },
116
+ "14": {
117
+ "content": "<jupyter_output>",
118
+ "lstrip": false,
119
+ "normalized": false,
120
+ "rstrip": false,
121
+ "single_word": false,
122
+ "special": true
123
+ },
124
+ "15": {
125
+ "content": "<jupyter_script>",
126
+ "lstrip": false,
127
+ "normalized": false,
128
+ "rstrip": false,
129
+ "single_word": false,
130
+ "special": true
131
+ },
132
+ "16": {
133
+ "content": "<empty_output>",
134
+ "lstrip": false,
135
+ "normalized": false,
136
+ "rstrip": false,
137
+ "single_word": false,
138
+ "special": true
139
+ }
140
+ },
141
+ "additional_special_tokens": [
142
+ "<|endoftext|>",
143
+ "<|im_start|>",
144
+ "<|im_end|>",
145
+ "<repo_name>",
146
+ "<reponame>",
147
+ "<file_sep>",
148
+ "<filename>",
149
+ "<gh_stars>",
150
+ "<issue_start>",
151
+ "<issue_comment>",
152
+ "<issue_closed>",
153
+ "<jupyter_start>",
154
+ "<jupyter_text>",
155
+ "<jupyter_code>",
156
+ "<jupyter_output>",
157
+ "<jupyter_script>",
158
+ "<empty_output>"
159
+ ],
160
+ "bos_token": "<|endoftext|>",
161
+ "clean_up_tokenization_spaces": false,
162
+ "eos_token": "<|endoftext|>",
163
+ "extra_special_tokens": {},
164
+ "model_max_length": 8192,
165
+ "pad_token": "<|endoftext|>",
166
+ "tokenizer_class": "GPT2Tokenizer",
167
+ "unk_token": "<|endoftext|>",
168
+ "vocab_size": 49152
169
+ }
training_metadata.json ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_name": "CycleCore-Maaza-SLM-135M-JSON",
3
+ "base_model": "HuggingFaceTB/SmolLM2-135M",
4
+ "training_date": "2025-11-20 10:50:49",
5
+ "num_epochs": 3,
6
+ "learning_rate": 0.0002,
7
+ "batch_size": 32,
8
+ "train_examples": 629,
9
+ "validation_examples": 0,
10
+ "test_examples": 158,
11
+ "lora_config": {
12
+ "enabled": true,
13
+ "r": 16,
14
+ "lora_alpha": 32,
15
+ "lora_dropout": 0.1,
16
+ "target_modules": [
17
+ "q_proj",
18
+ "v_proj",
19
+ "k_proj",
20
+ "o_proj",
21
+ "gate_proj",
22
+ "up_proj",
23
+ "down_proj"
24
+ ],
25
+ "bias": "none",
26
+ "task_type": "CAUSAL_LM"
27
+ },
28
+ "validation_run": false
29
+ }
vocab.json ADDED
The diff for this file is too large to render. See raw diff