rafiaa's picture
Upload README.md with huggingface_hub
b55c054 verified
---
library_name: peft
base_model: meta-llama/Llama-2-7b-chat-hf
tags:
- legal
- legal-text
- passive-to-active
- voice-transformation
- legal-nlp
- text-simplification
- legal-documents
- sentence-transformation
- lora
- qlora
- peft
- llama-2
- natural-language-processing
- legal-language
license: apache-2.0
language:
- en
pipeline_tag: text-generation
---
# legal-passive-to-active-llama-7b
A specialized LoRA fine-tuned model for transforming legal text from passive voice to active voice, built on Llama-2-7b-Chat. This model simplifies complex legal language while maintaining semantic accuracy and legal precision.
## Model Description
This model is a LoRA (Low-Rank Adaptation) fine-tuned version of Llama-2-7b-Chat-hf, specifically optimized for passive-to-active voice transformation in legal documents. It was trained on a curated dataset of 319 legal sentences from authoritative sources including UN documents, GDPR, Fair Work Act, and insurance regulations to understand legal syntax, passive constructions, and voice transformation patterns.
### Key Features
- **Legal Text Simplification**: Converts passive voice to active voice in legal documents
- **Domain-Specific**: Fine-tuned on authentic legal text from multiple jurisdictions
- **Efficient Training**: Uses QLoRA for memory-efficient fine-tuning
- **Semantic Preservation**: Maintains legal meaning while simplifying sentence structure
- **Accessibility**: Makes legal documents more readable and accessible
## Model Details
- **Developed by**: Rafi Al Attrach
- **Model type**: LoRA fine-tuned Llama-2
- **Language(s)**: English
- **License**: Apache 2.0
- **Finetuned from**: [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf)
- **Training method**: QLoRA (4-bit quantization + LoRA)
- **Research Focus**: Legal text simplification and accessibility (2024)
### Technical Specifications
- **Base Model**: Llama-2-7b-Chat-hf
- **LoRA Rank**: 64
- **Training Samples**: 319 legal sentences
- **Data Sources**: UN legal documents, GDPR, Fair Work Act, Insurance regulations
- **Evaluation**: BERTScore metrics and human evaluation
- **Performance**: ~6% improvement over base model in human evaluation
## Uses
### Direct Use
This model is designed for:
- **Legal document simplification**: Converting passive legal text to active voice
- **Accessibility improvement**: Making legal documents more readable
- **Legal writing assistance**: Helping legal professionals write clearer documents
- **Educational purposes**: Teaching legal language transformation
- **Document processing**: Batch processing of legal texts
### Example Use Cases
```python
# Transform a legal passive sentence to active voice
passive_sentence = "The contract shall be executed by both parties within 30 days."
# Model output: "Both parties shall execute the contract within 30 days."
```
```python
# Simplify GDPR text
passive_sentence = "Personal data may be processed by the controller for legitimate interests."
# Model output: "The controller may process personal data for legitimate interests."
```
## How to Get Started
### Installation
```bash
pip install transformers torch peft accelerate bitsandbytes
```
### Loading the Model
#### GPU Usage (Recommended)
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch
# Load base model with 4-bit quantization
base_model = "meta-llama/Llama-2-7b-chat-hf"
model = AutoModelForCausalLM.from_pretrained(
base_model,
load_in_4bit=True,
torch_dtype=torch.float16,
device_map="auto"
)
# Load LoRA adapter
model = PeftModel.from_pretrained(model, "rafiaa/legal-passive-to-active-llama-7b")
tokenizer = AutoTokenizer.from_pretrained(base_model)
# Set pad token
if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_token
```
#### CPU Usage (Alternative)
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch
# Load base model (CPU compatible)
base_model = "meta-llama/Llama-2-7b-chat-hf"
model = AutoModelForCausalLM.from_pretrained(
base_model,
torch_dtype=torch.float32,
device_map="cpu"
)
# Load LoRA adapter
model = PeftModel.from_pretrained(model, "rafiaa/legal-passive-to-active-llama-7b")
tokenizer = AutoTokenizer.from_pretrained(base_model)
# Set pad token
if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_token
```
### Usage Example
```python
def transform_passive_to_active(passive_sentence, max_length=512):
# Create instruction prompt
instruction = """You are a legal text transformation expert. Your task is to convert passive voice sentences to active voice while maintaining the exact legal meaning and terminology.
Input: Transform the following legal sentence from passive to active voice.
Legal Sentence: """
prompt = instruction + passive_sentence
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
outputs = model.generate(
**inputs,
max_length=max_length,
temperature=0.7,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
return tokenizer.decode(outputs[0], skip_special_tokens=True)
# Example usage
passive = "The agreement shall be signed by the authorized representatives."
active = transform_passive_to_active(passive)
print(active)
```
## Training Details
### Training Data
- **Dataset Size**: 319 legal sentences
- **Source Documents**:
- United Nations legal documents
- General Data Protection Regulation (GDPR)
- Fair Work Act (Australia)
- Insurance Council of Australia regulations
- **Data Split**: 85% training, 15% testing (with 15% of training for validation)
- **Domain**: Legal text across multiple jurisdictions
### Training Procedure
- **Method**: QLoRA (4-bit quantization + LoRA)
- **LoRA Configuration**: Rank 64, Alpha 16
- **Library**: unsloth (2.2x faster, 43% less VRAM)
- **Hardware**: Tesla T4 GPU (Google Colab)
- **Training Loss**: Downward trending validation loss indicating good generalization
### Evaluation Metrics
- **BERTScore**: Semantic similarity evaluation
- **Human Evaluation**: Binary correctness assessment by legal evaluators
- **Performance Improvement**: ~6% increase over base Llama-2 model
## Performance
The model was evaluated using both automatic metrics (BERTScore - Precision, Recall, F1) and human evaluation:
- **BERTScore F1**: High semantic similarity preservation
- **Human Evaluation**: ~6% improvement over base model
- **Strengths**: Good transformation of standard passive constructions
- **Challenges**: Complex sentences with nuanced word placement (e.g., "only")
## Limitations and Bias
### Known Limitations
- **Word Position Sensitivity**: Struggles with sentences where word position significantly alters meaning
- **Dataset Size**: Limited to 319 training samples
- **Non-Determinism**: LLM outputs may vary between runs
- **Domain Coverage**: Primarily trained on English common law and EU legal documents
- **'By' Constructions**: Occasionally faces challenges with sentences containing 'by' (subject indicator)
### Recommendations
- Validate transformed sentences for legal accuracy before use
- Use human review for critical legal documents
- Consider context and jurisdiction when applying transformations
- Test with domain-specific legal texts for best results
## Citation
If you use this model in your research, please cite:
```bibtex
@misc{legal-passive-active-llama2,
title={legal-passive-to-active-llama2-7b: A LoRA Fine-tuned Model for Legal Voice Transformation},
author={Rafi Al Attrach},
year={2024},
url={https://huggingface.co/rafiaa/legal-passive-to-active-llama-7b}
}
```
## Related Models
- **Base Model**: [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf)
- **Enhanced Version**: [rafiaa/legal-passive-to-active-mistral-7b](https://huggingface.co/rafiaa/legal-passive-to-active-mistral-7b) (Recommended - better performance)
## Model Card Contact
- **Author**: Rafi Al Attrach
- **Model Repository**: [HuggingFace Model](https://huggingface.co/rafiaa/legal-passive-to-active-llama-7b)
- **Issues**: Please report issues through the HuggingFace model page
## Acknowledgments
- **Research Project**: Legal text simplification and accessibility research (2024)
- **Training Data**: Public legal documents and regulations
- **Base Model**: Meta's Llama-2-7b-Chat-hf
---
*This model is part of a research project on legal text simplification and accessibility, focusing on passive-to-active voice transformation in legal documents.*