Slyracoon23 commited on
Commit
5059632
·
verified ·
1 Parent(s): 29de360

Upload LoRA adapter weights for DynastAI GRPO

Browse files
Files changed (3) hide show
  1. README.md +139 -48
  2. adapter_config.json +5 -5
  3. adapter_model.safetensors +1 -1
README.md CHANGED
@@ -3,109 +3,200 @@ base_model: unsloth/Qwen3-1.7B
3
  library_name: peft
4
  ---
5
 
6
- # Model Card for dynastai-grpo-lora
 
 
 
7
 
8
- This is a LoRA adapter for the Qwen3-1.7B model, fine-tuned using Generative Reward-based Policy Optimization (GRPO) to act as a royal advisor in the DynastAI kingdom simulation game. The model learns to select options that keep all kingdom metrics (Church, People, Military, Treasury) balanced close to 50.
9
 
10
  ## Model Details
11
 
12
  ### Model Description
13
 
14
- - **Developed by:** Slyracoon23
15
- - **Model type:** LoRA adapter for Qwen3-1.7B
16
- - **Language(s) (NLP):** English
17
- - **License:** [Qwen3-1.7B License](https://huggingface.co/Qwen/Qwen3-1.7B)
18
- - **Finetuned from model:** unsloth/Qwen3-1.7B
19
 
20
- This adapter is designed to help the base model make balanced decisions in a simulated kingdom management scenario, using prompts that describe the current state of the kingdom and possible actions.
 
 
 
 
 
 
21
 
22
- ### Model Sources
23
 
24
- - **Repository:** https://huggingface.co/Slyracoon23/dynastai-grpo-lora
 
 
 
 
25
 
26
  ## Uses
27
 
 
 
28
  ### Direct Use
29
 
30
- - Plug this LoRA adapter into the Qwen3-1.7B base model to act as a decision-making advisor for the DynastAI kingdom simulation game.
 
 
31
 
32
- ### Downstream Use
33
 
34
- - Can be used as a component in larger simulation or game AI systems where balanced decision-making is required.
 
 
35
 
36
  ### Out-of-Scope Use
37
 
38
- - Not intended for real-world policy or governance decisions.
39
- - Not suitable for use outside the context of the DynastAI simulation or similar games.
 
40
 
41
  ## Bias, Risks, and Limitations
42
 
43
- - The model is trained on synthetic game data and may not generalize to real-world scenarios.
44
- - Decisions are optimized for game balance, not for ethical or real-world outcomes.
 
45
 
46
  ### Recommendations
47
 
48
- Users should be aware that this model is for entertainment and research purposes only.
 
 
49
 
50
  ## How to Get Started with the Model
51
 
52
- ```python
53
- from peft import PeftModel, PeftConfig
54
- from transformers import AutoModelForCausalLM, AutoTokenizer
55
 
56
- base_model = AutoModelForCausalLM.from_pretrained("unsloth/Qwen3-1.7B")
57
- tokenizer = AutoTokenizer.from_pretrained("unsloth/Qwen3-1.7B")
58
- peft_model = PeftModel.from_pretrained(base_model, "Slyracoon23/dynastai-grpo-lora")
59
- ```
60
 
61
  ## Training Details
62
 
63
  ### Training Data
64
 
65
- - Synthetic prompts and solutions generated from the DynastAI kingdom simulation game, focusing on balancing four metrics: Church, People, Military, Treasury.
 
 
66
 
67
  ### Training Procedure
68
 
69
- - Fine-tuned using GRPO (Generative Reward-based Policy Optimization) with custom reward functions to encourage brevity and correctness.
 
 
 
 
 
70
 
71
  #### Training Hyperparameters
72
 
73
- - **Batch size:** 4
74
- - **Epochs:** 3
75
- - **Learning rate:** 2e-4
76
- - **Optimizer:** AdamW (8-bit)
77
- - **Precision:** 16-bit (LoRA)
 
 
78
 
79
  ## Evaluation
80
 
81
- - Evaluated on a held-out set of synthetic prompts for accuracy in selecting the optimal choice.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
82
 
83
  ## Environmental Impact
84
 
 
 
85
  Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
86
 
87
- - **Hardware Type:** 2 × NVIDIA L4 GPUs (each with 23 GB VRAM), Intel(R) Xeon(R) CPU @ 2.20GHz (24 vCPUs)
88
- - **Storage:** 500 GB NVMe SSD
89
- - **Operating System:** Ubuntu 24.04.2 LTS
90
- - **Cloud Provider:** Google Cloud Platform (GCP)
91
- - **Compute Region:** northamerica-northeast2 (from hostname: nvidia-l4-double-gpu-northamerica-northeast2-b)
92
- - **Hours used:** 5.8hrs
 
 
 
 
 
93
 
 
94
 
95
- ## Technical Specifications
96
 
97
- - **Base Model:** Qwen3-1.7B
98
- - **Adapter Type:** LoRA
99
- - **Framework:** PEFT 0.15.2
100
 
101
- ## Citation
102
 
103
- If you use this model, please cite the base model and this repository.
104
 
105
- ## Model Card Authors
106
 
107
- - Slyracoon23
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
108
 
109
  ## Model Card Contact
110
 
111
- - [Your contact info or Hugging Face profile]
 
 
 
 
3
  library_name: peft
4
  ---
5
 
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
 
 
11
 
12
  ## Model Details
13
 
14
  ### Model Description
15
 
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+
 
 
19
 
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
 
28
+ ### Model Sources [optional]
29
 
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
 
36
  ## Uses
37
 
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
  ### Direct Use
41
 
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
 
46
+ ### Downstream Use [optional]
47
 
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
+
50
+ [More Information Needed]
51
 
52
  ### Out-of-Scope Use
53
 
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
+
56
+ [More Information Needed]
57
 
58
  ## Bias, Risks, and Limitations
59
 
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
+
62
+ [More Information Needed]
63
 
64
  ### Recommendations
65
 
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
 
70
  ## How to Get Started with the Model
71
 
72
+ Use the code below to get started with the model.
 
 
73
 
74
+ [More Information Needed]
 
 
 
75
 
76
  ## Training Details
77
 
78
  ### Training Data
79
 
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ [More Information Needed]
83
 
84
  ### Training Procedure
85
 
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+
88
+ #### Preprocessing [optional]
89
+
90
+ [More Information Needed]
91
+
92
 
93
  #### Training Hyperparameters
94
 
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
+
97
+ #### Speeds, Sizes, Times [optional]
98
+
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
+
101
+ [More Information Needed]
102
 
103
  ## Evaluation
104
 
105
+ <!-- This section describes the evaluation protocols and provides the results. -->
106
+
107
+ ### Testing Data, Factors & Metrics
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
+
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Metrics
122
+
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
+
127
+ ### Results
128
+
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
 
141
  ## Environmental Impact
142
 
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
  Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
 
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
 
159
+ ### Compute Infrastructure
160
 
161
+ [More Information Needed]
162
 
163
+ #### Hardware
 
 
164
 
165
+ [More Information Needed]
166
 
167
+ #### Software
168
 
169
+ [More Information Needed]
170
 
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ## More Information [optional]
190
+
191
+ [More Information Needed]
192
+
193
+ ## Model Card Authors [optional]
194
+
195
+ [More Information Needed]
196
 
197
  ## Model Card Contact
198
 
199
+ [More Information Needed]
200
+ ### Framework versions
201
+
202
+ - PEFT 0.15.2
adapter_config.json CHANGED
@@ -24,13 +24,13 @@
24
  "rank_pattern": {},
25
  "revision": null,
26
  "target_modules": [
27
- "q_proj",
28
- "k_proj",
29
  "gate_proj",
30
- "o_proj",
31
- "down_proj",
32
  "up_proj",
33
- "v_proj"
 
 
 
34
  ],
35
  "task_type": "CAUSAL_LM",
36
  "trainable_token_indices": null,
 
24
  "rank_pattern": {},
25
  "revision": null,
26
  "target_modules": [
 
 
27
  "gate_proj",
28
+ "k_proj",
 
29
  "up_proj",
30
+ "down_proj",
31
+ "o_proj",
32
+ "v_proj",
33
+ "q_proj"
34
  ],
35
  "task_type": "CAUSAL_LM",
36
  "trainable_token_indices": null,
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:dcd658abe5258db738c7217a299b1ff1f6aec8050776d1523bc6188de228c598
3
  size 34917504
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2578cac168a04c524ae1da1e7f422c51cf8baedec007ed15f4e8c581da95c78b
3
  size 34917504