You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

BERT Model Adapted for the Legal Domain

This repository contains a BERT model adapted to the Brazilian legal domain, using Masked Language Modeling as the fine-tuning task. This model was trained on a specialized legal dataset with the goal of creating a robust foundation for various applications in the legal field, such as text classification, information extraction, and other related tasks.


Model Details

  • Base Model: bert-base-uncased
  • Training Task: Masked Language Modeling (MLM)
  • Domain: Legal
  • Objective: Adaptive fine-tuning for generalization in legal texts.
  • Architecture: BertForMaskedLM

Model Usage

This model can be used directly with the Hugging Face Masked Language Modeling pipeline. Application examples include filling gaps in legal texts to check for coherence, consistency, or to explore specific terminologies.

Inference Example

from transformers import pipeline

# Load the model from the Hugging Face Hub
fill_mask = pipeline("fill-mask", model="fabricioalmeida/bert-with-mlm-legal")

# Example phrase
input_text = "O contrato foi firmado entre as partes no dia [MASK]."

# Perform inference
results = fill_mask(input_text)
for result in results:
    print(f"Option: {result['token_str']}, Score: {result['score']}")

Training History

The model was trained through adaptive fine-tuning on a legal dataset with the following performance metrics:

Step Training Loss Validation Loss
2000 0.992100 0.824256
4000 0.812500 0.710587
6000 0.740800 0.656129
8000 0.699100 0.621186
10000 0.668100 0.594372
12000 0.641700 0.577950
14000 0.624800 0.569022
16000 0.603600 0.559712
18000 0.598100 0.544894
20000 0.588800 0.538299
22000 0.578800 0.525268
24000 0.573700 0.528776

Repository Structure

  • config.json: Model configurations.
  • pytorch_model.bin: Trained model weights.
  • tokenizer_config.json: Tokenizer configurations.
  • vocab.txt: Vocabulary used for training.

How to Cite

If you use this model in your research or application, please cite it as follows:

@misc{bert-juridico,
  author = {CARMO, A. F.},
  title = {LegalBERT-Anotado: aplicando Fine-tuning orientado à tokens Domínio Jurídico Brasileiro},
  year = {2024},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/fabricioalmeida/bert-with-mlm-legal}}
}

Contact

For questions or suggestions, please contact [email protected].

Downloads last month
-
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for fabricioalmeida/bert-with-mlm-legal

Finetuned
(6048)
this model