---
license: mit
language:
  - pt
tags:
  - masked-language-modeling
  - legal-domain
  - bert
  - portuguese
datasets:
  - seu_dataset_legal  # substitua pelo nome do dataset, se houver
metrics:
  - loss
pipeline_tag: fill-mask
base_model: bert-base-uncased
library_name: transformers
---


# BERT Model Adapted for the Legal Domain

This repository contains a BERT model adapted to the Brazilian legal domain, using Masked Language Modeling as the fine-tuning task. This model was trained on a specialized legal dataset with the goal of creating a robust foundation for various applications in the legal field, such as text classification, information extraction, and other related tasks.

---

## Model Details

* **Base Model**: bert-base-uncased
* **Training Task**: Masked Language Modeling (MLM)
* **Domain**: Legal
* **Objective**: Adaptive fine-tuning for generalization in legal texts.
* **Architecture**: BertForMaskedLM

---

## Model Usage

This model can be used directly with the Hugging Face **Masked Language Modeling** pipeline. Application examples include filling gaps in legal texts to check for coherence, consistency, or to explore specific terminologies.

### Inference Example

```python
from transformers import pipeline

# Load the model from the Hugging Face Hub
fill_mask = pipeline("fill-mask", model="fabricioalmeida/bert-with-mlm-legal")

# Example phrase
input_text = "O contrato foi firmado entre as partes no dia [MASK]."

# Perform inference
results = fill_mask(input_text)
for result in results:
    print(f"Option: {result['token_str']}, Score: {result['score']}")
```

## Training History
The model was trained through adaptive fine-tuning on a legal dataset with the following performance metrics:

| Step | Training Loss | Validation Loss |
|-------|---------------|-----------------|
| 2000  | 0.992100      | 0.824256        |
| 4000  | 0.812500      | 0.710587        |
| 6000  | 0.740800      | 0.656129        |
| 8000  | 0.699100      | 0.621186        |
| 10000 | 0.668100      | 0.594372        |
| 12000 | 0.641700      | 0.577950        |
| 14000 | 0.624800      | 0.569022        |
| 16000 | 0.603600      | 0.559712        |
| 18000 | 0.598100      | 0.544894        |
| 20000 | 0.588800      | 0.538299        |
| 22000 | 0.578800      | 0.525268        |
| 24000 | 0.573700      | 0.528776        |

## Repository Structure

- `config.json`: Model configurations.
- `pytorch_model.bin`: Trained model weights.
- `tokenizer_config.json`: Tokenizer configurations.
- `vocab.txt`: Vocabulary used for training.

---

## How to Cite

If you use this model in your research or application, please cite it as follows:

```
@misc{bert-juridico,
  author = {CARMO, A. F.},
  title = {LegalBERT-Anotado: aplicando Fine-tuning orientado à tokens Domínio Jurídico Brasileiro},
  year = {2024},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/fabricioalmeida/bert-with-mlm-legal}}
}
```

---

## Contact

For questions or suggestions, please contact [fabrycio30@gmail.com](mailto:fabrycio30@gmail.com).