๐Ÿ“ง Github
๐Ÿ”— LinkedIn

โ˜๏ธ Cloud Checkpoint Repository for GRPO Fine-Tuning on DeepSeek-R1-0528-Qwen3-8B

This repository serves as a cloud-to-cloud checkpoint for distributed fine-tuning and model continuation of the DeepSeek-R1-0528-Qwen3-8B model using Group Relative Policy Optimization (GRPO).

The checkpoints here are designed to be:

  • ๐Ÿ”„ Resumable โ€” continue training seamlessly across different cloud providers (e.g., Vast.ai โ†’ RunPod โ†’ Kaggle)
  • โš™๏ธ Mergeable โ€” combine multiple training shards or checkpoints for final model assembly
  • โ˜๏ธ Cloud-native โ€” optimized for upload/download efficiency and reusability in multi-cloud workflows

๐Ÿง  Project Overview

This project fine-tunes DeepSeek-R1-0528-Qwen3-8B on the Indonesian Legal QA Dataset to enhance structured legal reasoning in Bahasa Indonesia ๐Ÿ‡ฎ๐Ÿ‡ฉ.
The end goal is a final merged LLM specialized for Indonesian legal analysis, powered by GRPO.


โ˜๏ธ Cloud-to-Cloud Checkpointing Workflow

This repository is built to support inter-cloud checkpoint transfers, enabling you to pause and resume training between different compute environments.

๐Ÿ‹๏ธ Training Summary

Parameter Description
Base Model DeepSeek-R1-0528-Qwen3-8B
Preview Model Azzindani/Deepseek_ID_Legal_Preview
Fine-tuning Method Group Relative Policy Optimization (GRPO) + Knowledge Distillation
Pipeline Cloud-to-Cloud Distributed Training
Dataset Indonesian Legal Q&A
Compute NVIDIA RTX6000 Ada
Cloud Provider vast.ai
Last Training Steps 2000
Generations per Step 16
Distilled Dataset DeepSeek_0528_8B_Legal_Distill

๐Ÿงฉ About GRPO

Group Relative Policy Optimization (GRPO) improves model alignment by optimizing within structured sample groups (e.g., legal case types). Compared to standard RLHF, GRPO encourages contextual reasoning improvements rather than simple accuracy gains.

โœจ Benefits

  • Better step-by-step reasoning
  • More logical flow in legal Q&A
  • Higher consistency in applying Indonesian law references

๐Ÿง  Structured Reasoning Capability

During GRPO training, the model learns to reason like a legal analyst:

  1. Identify legal issues
  2. Recall relevant regulations
  3. Apply reasoning to context
  4. Formulate clear conclusions

๐Ÿค Acknowledgements


๐Ÿงพ License

This project is licensed under the Apache 2.0 License.


๐Ÿช„ Notes for Collaborators

If you are continuing training or merging checkpoints:

  • Always pull the latest checkpoint from this repo before training.
  • Push your progress under a new tag (e.g., checkpoint-2000) for traceability.
  • When merging, verify consistency using the model.index.json structure to avoid parameter conflicts.

๐Ÿ’ก โ€œTrain anywhere, resume everywhere โ€” building the Indonesian Legal LLM through cloud collaboration.โ€

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Azzindani/DeepSeek_0528_8B_Legal_Ckpt

Finetuned
(37)
this model

Dataset used to train Azzindani/DeepSeek_0528_8B_Legal_Ckpt