|
|
--- |
|
|
license: gemma |
|
|
tags: |
|
|
- unsloth |
|
|
datasets: |
|
|
- Phonepadith/laos-long-content |
|
|
language: |
|
|
- lo |
|
|
metrics: |
|
|
- bleu |
|
|
base_model: |
|
|
- google/gemma-3-12b-it |
|
|
new_version: Phonepadith/aidc-llm-laos-10k-gemma-3-12b-it |
|
|
pipeline_tag: text-generation |
|
|
library_name: fastai |
|
|
--- |
|
|
|
|
|
--- |
|
|
# 🧠 Lao Summarization Model ສົນທະນາ - ສະຫລຸບເນື້ອຫາສຳລັບພາສາລາວ - Fine-tuned Gemma 3 12B IT (10,000 Pairs, Laos Content Input-Output) |
|
|
|
|
|
This is a **Lao language summarization model** fine-tuned on the [`Phonepadith/laos_word_dataset`](https://huggingface.co/datasets/Phonepadith/laos_word_dataset), using the base model [`google/gemma-3-12b-it`](https://huggingface.co/google/gemma-3-12b-it). The model is designed to generate concise summaries from Lao language text. |
|
|
**Scope**: |
|
|
- 📚 ສະຫລຸບຂ່າວ |
|
|
- 📚 ສະຫລຸບເອກະສານພາກລັດ |
|
|
- 📚 ສະຫລຸບກອງປະຊຸມ |
|
|
--- |
|
|
|
|
|
# 🧠 Lao AIDC-10K Fine-tuned Gemma-3-12B-IT-V2 |
|
|
**Model ID**: `Phonepadith/aidc-llm-laos-10k-gemma-3-12b-it-v2` |
|
|
**Base Model**: [`google/gemma-3b-it`](https://huggingface.co/google/gemma-3b-it) |
|
|
**Fine-tuned By**: [Phonepadith Phoummavong](https://huggingface.co/Phonepadith) |
|
|
|
|
|
--- |
|
|
|
|
|
## 📌 Model Description |
|
|
|
|
|
This model is a fine-tuned version of **Gemma-3-12B-IT-**, specifically adapted to understand and generate responses in **Lao language** 🇱🇦. It was trained using a curated dataset of over **5,000 high-quality Lao input-output pairs**, primarily focused on **AIDC (Artificial Intelligence and Digital Content)** topics. |
|
|
|
|
|
**Key Features:** |
|
|
- 🗣️ Fine-tuned for Lao language generation |
|
|
- 📚 Suitable for summarization, question answering, general chat |
|
|
- 🧠 Based on Google's powerful Gemma 3-12B Instruct model |
|
|
|
|
|
--- |
|
|
|
|
|
## 🧾 Training Details |
|
|
|
|
|
| Detail | Value | |
|
|
|----------------------|-----------------------------------| |
|
|
| Base Model | Gemma 3-12B Instruct | |
|
|
| Fine-tuning Method | LoRA with PEFT (Unsloth) | |
|
|
| Dataset | 10,000 Laos supervised samples | |
|
|
| Sequence Length | 2048 | |
|
|
| Batch Size | 2 (with gradient accumulation) | |
|
|
| Optimizer | AdamW | |
|
|
| Epochs | 3–5 (early stopping enabled) | |
|
|
| Format | GGUF (F16, Q8_0, Q4_0 available) | |
|
|
|
|
|
--- |
|
|
|
|
|
## 📥 How to Use (LM Studio) |
|
|
|
|
|
1. **Install LM Studio**: [https://lmstudio.ai](https://lmstudio.ai) |
|
|
2. **Import the Model**: |
|
|
- Via Hugging Face: Search for `Phonepadith/aidc-llm-laos-10k-gemma-3-12b-it` |
|
|
- Or drag the `.gguf` file into LM Studio |
|
|
3. **Set System Prompt**: |
|
|
|
|
|
|
|
|
|
|
|
## 📌 Model Details |
|
|
|
|
|
- **Base Model**: [`google/gemma-3-12b-it`](https://huggingface.co/google/gemma-3-12b-it) |
|
|
- **Fine-tuned by**: [Phonepadith](https://huggingface.co/Phonepadith) |
|
|
- **Language**: Lao (`lo`) |
|
|
- **Task**: Text Generation |
|
|
- **Library**: `adapter-transformers` |
|
|
- **License**: Apache 2.0 |
|
|
|
|
|
--- |
|
|
|
|
|
## 📊 Metrics |
|
|
|
|
|
- **Evaluation Metric**: BLEU score |
|
|
BLEU is used to evaluate the quality of generated summaries against reference summaries in the dataset. |
|
|
|
|
|
--- |
|
|
|
|
|
## 🛠️ How to Use |
|
|
|
|
|
You can load and use the model with Hugging Face Transformers and `adapter-transformers`: |
|
|
|
|
|
```python |
|
|
|
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
|
|
model_id = "Phonepadith/aidc-llm-laos-10k-gemma-3-12b-it-v2" # change to your actual model name |
|
|
tokenizer = AutoTokenizer.from_pretrained(model_id) |
|
|
model = AutoModelForCausalLM.from_pretrained(model_id) |
|
|
|
|
|
input_text = "ປັດຈຸບັນ ກອງທັບປະຊາຊົນລາວ ມີການປະກອບວັດຖຸເຕັກນິກທັນສະໄໝສົມຄວນ, ສາມາດຕອບສະໜອງ ໃຫ້ແກ່ວຽກງານປ້ອງກັນຊາດ ໃນໄລຍະໃໝ່ ໄດ້ໂດຍພື້ນຖານ; ໄດ້ປະກອບສ່ວນຢ່າງຕັ້ງໜ້າເຂົ້າໃນການປ້ອງກັນ, ຄວບຄຸມໄພພິບັດ ແລະ ຊ່ວຍເຫລືອປະຊາຊົນ ຜູ້ປະສົບໄພພິບັດທຳມະຊາດຕ່າງໆທີ່ເກີດຂຶ້ນໃນຂອບເຂດທົ່ວປະເທດ. ພ້ອມນັ້ນ, ກໍໄດ້ເປັນເຈົ້າການປະກອບສ່ວນປັບປຸງກໍ່ສ້າງພື້ນ ຖານການເມືອງ, ກໍ່ສ້າງທ່າສະໜາມສົງຄາມປະຊາຊົນ 3 ຂັ້ນ ຕິດພັນກັບວຽກງານ 3 ສ້າງ ຢູ່ທ້ອງຖິ່ນຕາມ 4 ເນື້ອໃນ 4 ຄາດໝາຍ ແລະ ສືບທອດມູນເຊື້ອຄວາມສາມັກຄີ ກັບກອງທັບປະເທດເພື່ອນມິດ ສາກົນ, ປະຕິບັດນະໂຍບາຍເພີ່ມມິດຫລຸດຜ່ອນສັດຕູ, ຮັບປະກັນສະຖຽນລະພາບ ຂອງລະບອບການ ເມືອງ, ຮັກສາຄວາມສະຫງົບປອດໄພຕາມຊາຍແດນ" |
|
|
inputs = tokenizer(input_text, return_tensors="pt") |
|
|
summary_ids = model.generate(**inputs, max_new_tokens=100) |
|
|
summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True) |
|
|
|
|
|
print(summary) |