|
|
--- |
|
|
library_name: peft |
|
|
base_model: meta-llama/Llama-2-7b-chat-hf |
|
|
tags: |
|
|
- legal |
|
|
- legal-text |
|
|
- passive-to-active |
|
|
- voice-transformation |
|
|
- legal-nlp |
|
|
- text-simplification |
|
|
- legal-documents |
|
|
- sentence-transformation |
|
|
- lora |
|
|
- qlora |
|
|
- peft |
|
|
- llama-2 |
|
|
- natural-language-processing |
|
|
- legal-language |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
pipeline_tag: text-generation |
|
|
--- |
|
|
|
|
|
# legal-passive-to-active-llama-7b |
|
|
|
|
|
A specialized LoRA fine-tuned model for transforming legal text from passive voice to active voice, built on Llama-2-7b-Chat. This model simplifies complex legal language while maintaining semantic accuracy and legal precision. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
This model is a LoRA (Low-Rank Adaptation) fine-tuned version of Llama-2-7b-Chat-hf, specifically optimized for passive-to-active voice transformation in legal documents. It was trained on a curated dataset of 319 legal sentences from authoritative sources including UN documents, GDPR, Fair Work Act, and insurance regulations to understand legal syntax, passive constructions, and voice transformation patterns. |
|
|
|
|
|
### Key Features |
|
|
|
|
|
- **Legal Text Simplification**: Converts passive voice to active voice in legal documents |
|
|
- **Domain-Specific**: Fine-tuned on authentic legal text from multiple jurisdictions |
|
|
- **Efficient Training**: Uses QLoRA for memory-efficient fine-tuning |
|
|
- **Semantic Preservation**: Maintains legal meaning while simplifying sentence structure |
|
|
- **Accessibility**: Makes legal documents more readable and accessible |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Developed by**: Rafi Al Attrach |
|
|
- **Model type**: LoRA fine-tuned Llama-2 |
|
|
- **Language(s)**: English |
|
|
- **License**: Apache 2.0 |
|
|
- **Finetuned from**: [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) |
|
|
- **Training method**: QLoRA (4-bit quantization + LoRA) |
|
|
- **Research Focus**: Legal text simplification and accessibility (2024) |
|
|
|
|
|
### Technical Specifications |
|
|
|
|
|
- **Base Model**: Llama-2-7b-Chat-hf |
|
|
- **LoRA Rank**: 64 |
|
|
- **Training Samples**: 319 legal sentences |
|
|
- **Data Sources**: UN legal documents, GDPR, Fair Work Act, Insurance regulations |
|
|
- **Evaluation**: BERTScore metrics and human evaluation |
|
|
- **Performance**: ~6% improvement over base model in human evaluation |
|
|
|
|
|
## Uses |
|
|
|
|
|
### Direct Use |
|
|
|
|
|
This model is designed for: |
|
|
- **Legal document simplification**: Converting passive legal text to active voice |
|
|
- **Accessibility improvement**: Making legal documents more readable |
|
|
- **Legal writing assistance**: Helping legal professionals write clearer documents |
|
|
- **Educational purposes**: Teaching legal language transformation |
|
|
- **Document processing**: Batch processing of legal texts |
|
|
|
|
|
### Example Use Cases |
|
|
|
|
|
```python |
|
|
# Transform a legal passive sentence to active voice |
|
|
passive_sentence = "The contract shall be executed by both parties within 30 days." |
|
|
# Model output: "Both parties shall execute the contract within 30 days." |
|
|
``` |
|
|
|
|
|
```python |
|
|
# Simplify GDPR text |
|
|
passive_sentence = "Personal data may be processed by the controller for legitimate interests." |
|
|
# Model output: "The controller may process personal data for legitimate interests." |
|
|
``` |
|
|
|
|
|
## How to Get Started |
|
|
|
|
|
### Installation |
|
|
|
|
|
```bash |
|
|
pip install transformers torch peft accelerate bitsandbytes |
|
|
``` |
|
|
|
|
|
### Loading the Model |
|
|
|
|
|
#### GPU Usage (Recommended) |
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
from peft import PeftModel |
|
|
import torch |
|
|
|
|
|
# Load base model with 4-bit quantization |
|
|
base_model = "meta-llama/Llama-2-7b-chat-hf" |
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
base_model, |
|
|
load_in_4bit=True, |
|
|
torch_dtype=torch.float16, |
|
|
device_map="auto" |
|
|
) |
|
|
|
|
|
# Load LoRA adapter |
|
|
model = PeftModel.from_pretrained(model, "rafiaa/legal-passive-to-active-llama-7b") |
|
|
tokenizer = AutoTokenizer.from_pretrained(base_model) |
|
|
|
|
|
# Set pad token |
|
|
if tokenizer.pad_token is None: |
|
|
tokenizer.pad_token = tokenizer.eos_token |
|
|
``` |
|
|
|
|
|
#### CPU Usage (Alternative) |
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
from peft import PeftModel |
|
|
import torch |
|
|
|
|
|
# Load base model (CPU compatible) |
|
|
base_model = "meta-llama/Llama-2-7b-chat-hf" |
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
base_model, |
|
|
torch_dtype=torch.float32, |
|
|
device_map="cpu" |
|
|
) |
|
|
|
|
|
# Load LoRA adapter |
|
|
model = PeftModel.from_pretrained(model, "rafiaa/legal-passive-to-active-llama-7b") |
|
|
tokenizer = AutoTokenizer.from_pretrained(base_model) |
|
|
|
|
|
# Set pad token |
|
|
if tokenizer.pad_token is None: |
|
|
tokenizer.pad_token = tokenizer.eos_token |
|
|
``` |
|
|
|
|
|
### Usage Example |
|
|
|
|
|
```python |
|
|
def transform_passive_to_active(passive_sentence, max_length=512): |
|
|
# Create instruction prompt |
|
|
instruction = """You are a legal text transformation expert. Your task is to convert passive voice sentences to active voice while maintaining the exact legal meaning and terminology. |
|
|
|
|
|
Input: Transform the following legal sentence from passive to active voice. |
|
|
|
|
|
Legal Sentence: """ |
|
|
|
|
|
prompt = instruction + passive_sentence |
|
|
inputs = tokenizer(prompt, return_tensors="pt") |
|
|
|
|
|
with torch.no_grad(): |
|
|
outputs = model.generate( |
|
|
**inputs, |
|
|
max_length=max_length, |
|
|
temperature=0.7, |
|
|
do_sample=True, |
|
|
pad_token_id=tokenizer.eos_token_id |
|
|
) |
|
|
|
|
|
return tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
|
|
|
|
# Example usage |
|
|
passive = "The agreement shall be signed by the authorized representatives." |
|
|
active = transform_passive_to_active(passive) |
|
|
print(active) |
|
|
``` |
|
|
|
|
|
## Training Details |
|
|
|
|
|
### Training Data |
|
|
|
|
|
- **Dataset Size**: 319 legal sentences |
|
|
- **Source Documents**: |
|
|
- United Nations legal documents |
|
|
- General Data Protection Regulation (GDPR) |
|
|
- Fair Work Act (Australia) |
|
|
- Insurance Council of Australia regulations |
|
|
- **Data Split**: 85% training, 15% testing (with 15% of training for validation) |
|
|
- **Domain**: Legal text across multiple jurisdictions |
|
|
|
|
|
### Training Procedure |
|
|
|
|
|
- **Method**: QLoRA (4-bit quantization + LoRA) |
|
|
- **LoRA Configuration**: Rank 64, Alpha 16 |
|
|
- **Library**: unsloth (2.2x faster, 43% less VRAM) |
|
|
- **Hardware**: Tesla T4 GPU (Google Colab) |
|
|
- **Training Loss**: Downward trending validation loss indicating good generalization |
|
|
|
|
|
### Evaluation Metrics |
|
|
|
|
|
- **BERTScore**: Semantic similarity evaluation |
|
|
- **Human Evaluation**: Binary correctness assessment by legal evaluators |
|
|
- **Performance Improvement**: ~6% increase over base Llama-2 model |
|
|
|
|
|
## Performance |
|
|
|
|
|
The model was evaluated using both automatic metrics (BERTScore - Precision, Recall, F1) and human evaluation: |
|
|
|
|
|
- **BERTScore F1**: High semantic similarity preservation |
|
|
- **Human Evaluation**: ~6% improvement over base model |
|
|
- **Strengths**: Good transformation of standard passive constructions |
|
|
- **Challenges**: Complex sentences with nuanced word placement (e.g., "only") |
|
|
|
|
|
## Limitations and Bias |
|
|
|
|
|
### Known Limitations |
|
|
|
|
|
- **Word Position Sensitivity**: Struggles with sentences where word position significantly alters meaning |
|
|
- **Dataset Size**: Limited to 319 training samples |
|
|
- **Non-Determinism**: LLM outputs may vary between runs |
|
|
- **Domain Coverage**: Primarily trained on English common law and EU legal documents |
|
|
- **'By' Constructions**: Occasionally faces challenges with sentences containing 'by' (subject indicator) |
|
|
|
|
|
### Recommendations |
|
|
|
|
|
- Validate transformed sentences for legal accuracy before use |
|
|
- Use human review for critical legal documents |
|
|
- Consider context and jurisdiction when applying transformations |
|
|
- Test with domain-specific legal texts for best results |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model in your research, please cite: |
|
|
|
|
|
```bibtex |
|
|
@misc{legal-passive-active-llama2, |
|
|
title={legal-passive-to-active-llama2-7b: A LoRA Fine-tuned Model for Legal Voice Transformation}, |
|
|
author={Rafi Al Attrach}, |
|
|
year={2024}, |
|
|
url={https://huggingface.co/rafiaa/legal-passive-to-active-llama-7b} |
|
|
} |
|
|
``` |
|
|
|
|
|
## Related Models |
|
|
|
|
|
- **Base Model**: [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) |
|
|
- **Enhanced Version**: [rafiaa/legal-passive-to-active-mistral-7b](https://huggingface.co/rafiaa/legal-passive-to-active-mistral-7b) (Recommended - better performance) |
|
|
|
|
|
## Model Card Contact |
|
|
|
|
|
- **Author**: Rafi Al Attrach |
|
|
- **Model Repository**: [HuggingFace Model](https://huggingface.co/rafiaa/legal-passive-to-active-llama-7b) |
|
|
- **Issues**: Please report issues through the HuggingFace model page |
|
|
|
|
|
## Acknowledgments |
|
|
|
|
|
- **Research Project**: Legal text simplification and accessibility research (2024) |
|
|
- **Training Data**: Public legal documents and regulations |
|
|
- **Base Model**: Meta's Llama-2-7b-Chat-hf |
|
|
|
|
|
--- |
|
|
|
|
|
*This model is part of a research project on legal text simplification and accessibility, focusing on passive-to-active voice transformation in legal documents.* |
|
|
|