Business Strategy Model (GRPO Fine-tuned)
This model is a fine-tuned version of Qwen/Qwen2.5-3B-Instruct using GRPO (Group Relative Policy Optimization) for business strategy generation.
Training Details
- Base Model: Qwen/Qwen2.5-3B-Instruct (3B parameters)
- Fine-tuning Method: LoRA adapters with GRPO
- Dataset: OrgStrategy-Reasoning-1k-v2
- Use Case: Strategic business planning and decision-making
Usage
With PEFT:
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base_model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen2.5-3B-Instruct",
torch_dtype="auto",
device_map="auto"
)
model = PeftModel.from_pretrained(base_model, "Wildstash/business-strategy-grpo")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-3B-Instruct")
# Generate strategy
prompt = "A tech startup wants to compete against established market leaders. Recommend a strategy."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Deployment
This model can be deployed on:
- Hugging Face Inference Endpoints (recommended)
- AWS SageMaker
- Local inference with GPU
License
Apache 2.0