โ๏ธ Cloud Checkpoint Repository for GRPO Fine-Tuning on DeepSeek-R1-0528-Qwen3-8B
This repository serves as a cloud-to-cloud checkpoint for distributed fine-tuning and model continuation of the DeepSeek-R1-0528-Qwen3-8B model using Group Relative Policy Optimization (GRPO).
The checkpoints here are designed to be:
- ๐ Resumable โ continue training seamlessly across different cloud providers (e.g., Vast.ai โ RunPod โ Kaggle)
- โ๏ธ Mergeable โ combine multiple training shards or checkpoints for final model assembly
- โ๏ธ Cloud-native โ optimized for upload/download efficiency and reusability in multi-cloud workflows
๐ง Project Overview
This project fine-tunes DeepSeek-R1-0528-Qwen3-8B on the Indonesian Legal QA Dataset to enhance structured legal reasoning in Bahasa Indonesia ๐ฎ๐ฉ.
The end goal is a final merged LLM specialized for Indonesian legal analysis, powered by GRPO.
โ๏ธ Cloud-to-Cloud Checkpointing Workflow
This repository is built to support inter-cloud checkpoint transfers, enabling you to pause and resume training between different compute environments.
๐๏ธ Training Summary
| Parameter | Description |
|---|---|
| Base Model | DeepSeek-R1-0528-Qwen3-8B |
| Preview Model | Azzindani/Deepseek_ID_Legal_Preview |
| Fine-tuning Method | Group Relative Policy Optimization (GRPO) + Knowledge Distillation |
| Pipeline | Cloud-to-Cloud Distributed Training |
| Dataset | Indonesian Legal Q&A |
| Compute | NVIDIA RTX6000 Ada |
| Cloud Provider | vast.ai |
| Last Training Steps | 2000 |
| Generations per Step | 16 |
| Distilled Dataset | DeepSeek_0528_8B_Legal_Distill |
๐งฉ About GRPO
Group Relative Policy Optimization (GRPO) improves model alignment by optimizing within structured sample groups (e.g., legal case types). Compared to standard RLHF, GRPO encourages contextual reasoning improvements rather than simple accuracy gains.
โจ Benefits
- Better step-by-step reasoning
- More logical flow in legal Q&A
- Higher consistency in applying Indonesian law references
๐ง Structured Reasoning Capability
During GRPO training, the model learns to reason like a legal analyst:
- Identify legal issues
- Recall relevant regulations
- Apply reasoning to context
- Formulate clear conclusions
๐ค Acknowledgements
- DeepSeek team for the base model
- GRPO Paper (2024) for the optimization method
- unsloth.ai for efficient fine-tuning
- vast.ai for affordable compute
๐งพ License
This project is licensed under the Apache 2.0 License.
๐ช Notes for Collaborators
If you are continuing training or merging checkpoints:
- Always pull the latest checkpoint from this repo before training.
- Push your progress under a new tag (e.g.,
checkpoint-2000) for traceability. - When merging, verify consistency using the
model.index.jsonstructure to avoid parameter conflicts.
๐ก โTrain anywhere, resume everywhere โ building the Indonesian Legal LLM through cloud collaboration.โ
Model tree for Azzindani/DeepSeek_0528_8B_Legal_Ckpt
Base model
deepseek-ai/DeepSeek-R1-0528-Qwen3-8B