|
|
--- |
|
|
license: apache-2.0 |
|
|
base_model: Qwen/Qwen2.5-3B-Instruct |
|
|
tags: |
|
|
- peft |
|
|
- lora |
|
|
- business-strategy |
|
|
- reinforcement-learning |
|
|
- grpo |
|
|
library_name: transformers |
|
|
--- |
|
|
|
|
|
# Business Strategy Model (GRPO Fine-tuned) |
|
|
|
|
|
This model is a fine-tuned version of [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) using GRPO (Group Relative Policy Optimization) for business strategy generation. |
|
|
|
|
|
## Training Details |
|
|
|
|
|
- **Base Model**: Qwen/Qwen2.5-3B-Instruct (3B parameters) |
|
|
- **Fine-tuning Method**: LoRA adapters with GRPO |
|
|
- **Dataset**: OrgStrategy-Reasoning-1k-v2 |
|
|
- **Use Case**: Strategic business planning and decision-making |
|
|
|
|
|
## Usage |
|
|
|
|
|
### With PEFT: |
|
|
|
|
|
```python |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
from peft import PeftModel |
|
|
|
|
|
base_model = AutoModelForCausalLM.from_pretrained( |
|
|
"Qwen/Qwen2.5-3B-Instruct", |
|
|
torch_dtype="auto", |
|
|
device_map="auto" |
|
|
) |
|
|
model = PeftModel.from_pretrained(base_model, "Wildstash/business-strategy-grpo") |
|
|
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-3B-Instruct") |
|
|
|
|
|
# Generate strategy |
|
|
prompt = "A tech startup wants to compete against established market leaders. Recommend a strategy." |
|
|
inputs = tokenizer(prompt, return_tensors="pt").to(model.device) |
|
|
outputs = model.generate(**inputs, max_new_tokens=512) |
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
|
``` |
|
|
|
|
|
## Deployment |
|
|
|
|
|
This model can be deployed on: |
|
|
- Hugging Face Inference Endpoints (recommended) |
|
|
- AWS SageMaker |
|
|
- Local inference with GPU |
|
|
|
|
|
## License |
|
|
|
|
|
Apache 2.0 |
|
|
|