---
library_name: transformers
tags:
  - sequence-classification
  - text-classification
  - nli
  - xlm-roberta
  - vietnamese
  - kaggle
---

# XLM-RoBERTa-base fine-tuned for Vietnamese NLI

A Vietnamese Natural Language Inference (NLI) model that predicts the relation between a **premise** and a **hypothesis** as one of:
- `c` (contradiction)
- `n` (neutral)
- `e` (entailment)

This model fine-tunes **xlm-roberta-base** using a stratified 80/10/10 split, optimized to run on a single GPU (Kaggle T4/P100).

---

## Model Details

- **Developed by:** Lê Lý (MoMo Talent 2025)
- **Model type:** XLM-RoBERTa encoder for sequence classification (3 labels)
- **Languages:** Vietnamese (vi)
- **License:** Inherits from upstream **xlm-roberta-base** (set the model page license accordingly)
- **Finetuned from:** `xlm-roberta-base`

### Model Sources
- **Base model:** XLM-RoBERTa (Conneau et al., 2020)
- **Training script:** Included below in this card (Kaggle-ready)

---

## Uses

### Direct Use
- Vietnamese NLI inference for research, demos, or as a component in larger systems (e.g., retrieval/ranking, dialog consistency checks).

### Downstream Use
- Fine-tune further on domain-specific VN NLI or related tasks (stance detection, contradiction detection in QA/assistants).

### Out-of-Scope Use
- Non-VN text without adaptation.
- Safety-critical decisions without human oversight.
- Open-domain factual verification (this is NLI, not a fact-checker).

---

## Bias, Risks, and Limitations

- Trained on a VN NLI dataset; distributional shift (domain, register, slang, figurative language) may degrade performance.
- NLI labels can be sensitive to annotation style/instructions; avoid over-interpreting borderline cases.

**Recommendations:** Evaluate on your target domain; monitor confusion between `n` vs `e`/`c`; consider calibration or thresholding if used in pipelines.

---

## How to Get Started

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_id = "YOUR_USERNAME/xlmr-vinli-finetune"  # replace with your repo id
tok = AutoTokenizer.from_pretrained(model_id)
mdl = AutoModelForSequenceClassification.from_pretrained(model_id)

id2label = mdl.config.id2label  # {0:'c',1:'n',2:'e'}
text = {"premise": "Trời đang mưa rất to.", "hypothesis": "Bên ngoài khô ráo và không có mưa."}

enc = tok(text["premise"], text["hypothesis"], return_tensors="pt", truncation=True, max_length=256)
with torch.no_grad():
    logits = mdl(**enc).logits
pred = logits.softmax(-1).argmax(-1).item()
print("Prediction:", id2label[pred])
```

## Training Details

### Data
- **Path (Kaggle):** `/kaggle/input/nli-vietnam/full_data_true.json`
- **Labels:** `{"c":0, "n":1, "e":2}`
- **Split:** Stratified ~80/10/10 (train/val/test)

*Ensure JSON has fields: `id`, `premise`, `hypothesis`, `label` (labels in `{c,n,e}`).*

### Procedure

**Preprocessing**
- **Tokenizer:** `XLMRobertaTokenizerFast` (max_length=256, truncation)

**Hyperparameters**
- **Epochs:** 4
- **Optim:** AdamW (via HF Trainer)
- **LR:** 2e-5
- **Weight decay:** 0.01
- **Warmup ratio:** 0.06
- **Scheduler:** linear
- **Batch:** `per_device_train_batch_size=8`, `per_device_eval_batch_size=32`
- **Grad Accumulation:** 2 (effective train batch ~16)
- **Precision:** `bf16` if available (Ampere+), else `fp16`
- **Label smoothing:** 0.05
- **Early stopping:** patience 2
- **Gradient checkpointing:** enabled
- `save_safetensors=True`, `load_best_model_at_end=True` on `f1_macro`

### Compute
- **Hardware:** Single NVIDIA T4/P100 16GB (Kaggle)
- `dataloader_num_workers=2`, `pin_memory=True`

### Speeds, Sizes, Times
- **Checkpoint size:** standard `xlm-roberta-base` head (+classifier)
- *Exact wall-clock depends on GPU; typical Kaggle session completes within normal time limits.*

---

## Evaluation

### Metrics & Factors
- **Metrics:** Accuracy, Macro F1
- **Factors:** Per-label performance (c, n, e)

### Results (Test)
```yaml
Accuracy: 0.9901
Macro F1: 0.9878
Support: 1113 samples (c=429, n=108, e=576)
```

**Classification Report:**
```
              precision    recall  f1-score   support

           c     0.9930    0.9883    0.9907       429
           n     0.9815    0.9815    0.9815       108
           e     0.9896    0.9931    0.9913       576

weighted avg     0.9901    0.9901    0.9901      1113
```

**Confusion Matrix:**
```[[424   0   5],
 [  1 106   1],
 [  2   2 572]]
```
*Note: Replicate numbers may vary slightly due to randomness/hardware.*

### Environmental Impact
- **Hardware:** Single T4/P100 16GB (Kaggle)
- **Cloud Provider/Region:** Kaggle (unspecified)
- **Hours used:** Not logged
- **Carbon Emitted:** Not estimated
  - *You can estimate with the [MLCO2 Impact calculator](https://mlco2.github.io/impact#compute).*

---

## Technical Specifications

### Architecture & Objective
- **Backbone:** XLM-RoBERTa Base
- **Head:** Linear classification (3 labels)
- **Objective:** Cross-entropy with label smoothing (0.05); optional class weighting (off by default)

### Software
- `transformers==4.43.3`
- `datasets==2.21.0`
- `accelerate==0.33.0`
- `evaluate==0.4.2`
- `scikit-learn==1.5.1`
- `torch` (CUDA)

---

## Citation

### XLM-RoBERTa
```bibtex
@inproceedings{conneau2020unsupervised,
  title={Unsupervised Cross-lingual Representation Learning at Scale},
  author={Conneau, Alexis and Khandelwal, Kartikay and Goyal, Naman and Chaudhary, Vishrav and Wenzek, Guillaume and Guzm{\'a}n, Francisco and Grave, Edouard and Ott, Myle and Zettlemoyer, Luke and Stoyanov, Veselin},
  booktitle={ACL},
  year={2020}
}
```

## Contact
**Author:** Lê Lý