Business Strategy Model (GRPO Fine-tuned)

This model is a fine-tuned version of Qwen/Qwen2.5-3B-Instruct using GRPO (Group Relative Policy Optimization) for business strategy generation.

Training Details

  • Base Model: Qwen/Qwen2.5-3B-Instruct (3B parameters)
  • Fine-tuning Method: LoRA adapters with GRPO
  • Dataset: OrgStrategy-Reasoning-1k-v2
  • Use Case: Strategic business planning and decision-making

Usage

With PEFT:

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-3B-Instruct",
    torch_dtype="auto",
    device_map="auto"
)
model = PeftModel.from_pretrained(base_model, "Wildstash/business-strategy-grpo")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-3B-Instruct")

# Generate strategy
prompt = "A tech startup wants to compete against established market leaders. Recommend a strategy."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Deployment

This model can be deployed on:

  • Hugging Face Inference Endpoints (recommended)
  • AWS SageMaker
  • Local inference with GPU

License

Apache 2.0

Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading

Model tree for Wildstash/business-strategy-grpo

Base model

Qwen/Qwen2.5-3B
Adapter
(601)
this model