BERT Model Adapted for the Legal Domain
This repository contains a BERT model adapted to the Brazilian legal domain, using Masked Language Modeling as the fine-tuning task. This model was trained on a specialized legal dataset with the goal of creating a robust foundation for various applications in the legal field, such as text classification, information extraction, and other related tasks.
Model Details
- Base Model: bert-base-uncased
- Training Task: Masked Language Modeling (MLM)
- Domain: Legal
- Objective: Adaptive fine-tuning for generalization in legal texts.
- Architecture: BertForMaskedLM
Model Usage
This model can be used directly with the Hugging Face Masked Language Modeling pipeline. Application examples include filling gaps in legal texts to check for coherence, consistency, or to explore specific terminologies.
Inference Example
from transformers import pipeline
# Load the model from the Hugging Face Hub
fill_mask = pipeline("fill-mask", model="fabricioalmeida/bert-with-mlm-legal")
# Example phrase
input_text = "O contrato foi firmado entre as partes no dia [MASK]."
# Perform inference
results = fill_mask(input_text)
for result in results:
print(f"Option: {result['token_str']}, Score: {result['score']}")
Training History
The model was trained through adaptive fine-tuning on a legal dataset with the following performance metrics:
| Step | Training Loss | Validation Loss |
|---|---|---|
| 2000 | 0.992100 | 0.824256 |
| 4000 | 0.812500 | 0.710587 |
| 6000 | 0.740800 | 0.656129 |
| 8000 | 0.699100 | 0.621186 |
| 10000 | 0.668100 | 0.594372 |
| 12000 | 0.641700 | 0.577950 |
| 14000 | 0.624800 | 0.569022 |
| 16000 | 0.603600 | 0.559712 |
| 18000 | 0.598100 | 0.544894 |
| 20000 | 0.588800 | 0.538299 |
| 22000 | 0.578800 | 0.525268 |
| 24000 | 0.573700 | 0.528776 |
Repository Structure
config.json: Model configurations.pytorch_model.bin: Trained model weights.tokenizer_config.json: Tokenizer configurations.vocab.txt: Vocabulary used for training.
How to Cite
If you use this model in your research or application, please cite it as follows:
@misc{bert-juridico,
author = {CARMO, A. F.},
title = {LegalBERT-Anotado: aplicando Fine-tuning orientado à tokens Domínio Jurídico Brasileiro},
year = {2024},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/fabricioalmeida/bert-with-mlm-legal}}
}
Contact
For questions or suggestions, please contact [email protected].
- Downloads last month
- -
Model tree for fabricioalmeida/bert-with-mlm-legal
Base model
google-bert/bert-base-uncased