--- library_name: peft base_model: meta-llama/Llama-2-7b-chat-hf tags: - legal - legal-text - passive-to-active - voice-transformation - legal-nlp - text-simplification - legal-documents - sentence-transformation - lora - qlora - peft - llama-2 - natural-language-processing - legal-language license: apache-2.0 language: - en pipeline_tag: text-generation --- # legal-passive-to-active-llama-7b A specialized LoRA fine-tuned model for transforming legal text from passive voice to active voice, built on Llama-2-7b-Chat. This model simplifies complex legal language while maintaining semantic accuracy and legal precision. ## Model Description This model is a LoRA (Low-Rank Adaptation) fine-tuned version of Llama-2-7b-Chat-hf, specifically optimized for passive-to-active voice transformation in legal documents. It was trained on a curated dataset of 319 legal sentences from authoritative sources including UN documents, GDPR, Fair Work Act, and insurance regulations to understand legal syntax, passive constructions, and voice transformation patterns. ### Key Features - **Legal Text Simplification**: Converts passive voice to active voice in legal documents - **Domain-Specific**: Fine-tuned on authentic legal text from multiple jurisdictions - **Efficient Training**: Uses QLoRA for memory-efficient fine-tuning - **Semantic Preservation**: Maintains legal meaning while simplifying sentence structure - **Accessibility**: Makes legal documents more readable and accessible ## Model Details - **Developed by**: Rafi Al Attrach - **Model type**: LoRA fine-tuned Llama-2 - **Language(s)**: English - **License**: Apache 2.0 - **Finetuned from**: [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) - **Training method**: QLoRA (4-bit quantization + LoRA) - **Research Focus**: Legal text simplification and accessibility (2024) ### Technical Specifications - **Base Model**: Llama-2-7b-Chat-hf - **LoRA Rank**: 64 - **Training Samples**: 319 legal sentences - **Data Sources**: UN legal documents, GDPR, Fair Work Act, Insurance regulations - **Evaluation**: BERTScore metrics and human evaluation - **Performance**: ~6% improvement over base model in human evaluation ## Uses ### Direct Use This model is designed for: - **Legal document simplification**: Converting passive legal text to active voice - **Accessibility improvement**: Making legal documents more readable - **Legal writing assistance**: Helping legal professionals write clearer documents - **Educational purposes**: Teaching legal language transformation - **Document processing**: Batch processing of legal texts ### Example Use Cases ```python # Transform a legal passive sentence to active voice passive_sentence = "The contract shall be executed by both parties within 30 days." # Model output: "Both parties shall execute the contract within 30 days." ``` ```python # Simplify GDPR text passive_sentence = "Personal data may be processed by the controller for legitimate interests." # Model output: "The controller may process personal data for legitimate interests." ``` ## How to Get Started ### Installation ```bash pip install transformers torch peft accelerate bitsandbytes ``` ### Loading the Model #### GPU Usage (Recommended) ```python from transformers import AutoTokenizer, AutoModelForCausalLM from peft import PeftModel import torch # Load base model with 4-bit quantization base_model = "meta-llama/Llama-2-7b-chat-hf" model = AutoModelForCausalLM.from_pretrained( base_model, load_in_4bit=True, torch_dtype=torch.float16, device_map="auto" ) # Load LoRA adapter model = PeftModel.from_pretrained(model, "rafiaa/legal-passive-to-active-llama-7b") tokenizer = AutoTokenizer.from_pretrained(base_model) # Set pad token if tokenizer.pad_token is None: tokenizer.pad_token = tokenizer.eos_token ``` #### CPU Usage (Alternative) ```python from transformers import AutoTokenizer, AutoModelForCausalLM from peft import PeftModel import torch # Load base model (CPU compatible) base_model = "meta-llama/Llama-2-7b-chat-hf" model = AutoModelForCausalLM.from_pretrained( base_model, torch_dtype=torch.float32, device_map="cpu" ) # Load LoRA adapter model = PeftModel.from_pretrained(model, "rafiaa/legal-passive-to-active-llama-7b") tokenizer = AutoTokenizer.from_pretrained(base_model) # Set pad token if tokenizer.pad_token is None: tokenizer.pad_token = tokenizer.eos_token ``` ### Usage Example ```python def transform_passive_to_active(passive_sentence, max_length=512): # Create instruction prompt instruction = """You are a legal text transformation expert. Your task is to convert passive voice sentences to active voice while maintaining the exact legal meaning and terminology. Input: Transform the following legal sentence from passive to active voice. Legal Sentence: """ prompt = instruction + passive_sentence inputs = tokenizer(prompt, return_tensors="pt") with torch.no_grad(): outputs = model.generate( **inputs, max_length=max_length, temperature=0.7, do_sample=True, pad_token_id=tokenizer.eos_token_id ) return tokenizer.decode(outputs[0], skip_special_tokens=True) # Example usage passive = "The agreement shall be signed by the authorized representatives." active = transform_passive_to_active(passive) print(active) ``` ## Training Details ### Training Data - **Dataset Size**: 319 legal sentences - **Source Documents**: - United Nations legal documents - General Data Protection Regulation (GDPR) - Fair Work Act (Australia) - Insurance Council of Australia regulations - **Data Split**: 85% training, 15% testing (with 15% of training for validation) - **Domain**: Legal text across multiple jurisdictions ### Training Procedure - **Method**: QLoRA (4-bit quantization + LoRA) - **LoRA Configuration**: Rank 64, Alpha 16 - **Library**: unsloth (2.2x faster, 43% less VRAM) - **Hardware**: Tesla T4 GPU (Google Colab) - **Training Loss**: Downward trending validation loss indicating good generalization ### Evaluation Metrics - **BERTScore**: Semantic similarity evaluation - **Human Evaluation**: Binary correctness assessment by legal evaluators - **Performance Improvement**: ~6% increase over base Llama-2 model ## Performance The model was evaluated using both automatic metrics (BERTScore - Precision, Recall, F1) and human evaluation: - **BERTScore F1**: High semantic similarity preservation - **Human Evaluation**: ~6% improvement over base model - **Strengths**: Good transformation of standard passive constructions - **Challenges**: Complex sentences with nuanced word placement (e.g., "only") ## Limitations and Bias ### Known Limitations - **Word Position Sensitivity**: Struggles with sentences where word position significantly alters meaning - **Dataset Size**: Limited to 319 training samples - **Non-Determinism**: LLM outputs may vary between runs - **Domain Coverage**: Primarily trained on English common law and EU legal documents - **'By' Constructions**: Occasionally faces challenges with sentences containing 'by' (subject indicator) ### Recommendations - Validate transformed sentences for legal accuracy before use - Use human review for critical legal documents - Consider context and jurisdiction when applying transformations - Test with domain-specific legal texts for best results ## Citation If you use this model in your research, please cite: ```bibtex @misc{legal-passive-active-llama2, title={legal-passive-to-active-llama2-7b: A LoRA Fine-tuned Model for Legal Voice Transformation}, author={Rafi Al Attrach}, year={2024}, url={https://huggingface.co/rafiaa/legal-passive-to-active-llama-7b} } ``` ## Related Models - **Base Model**: [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) - **Enhanced Version**: [rafiaa/legal-passive-to-active-mistral-7b](https://huggingface.co/rafiaa/legal-passive-to-active-mistral-7b) (Recommended - better performance) ## Model Card Contact - **Author**: Rafi Al Attrach - **Model Repository**: [HuggingFace Model](https://huggingface.co/rafiaa/legal-passive-to-active-llama-7b) - **Issues**: Please report issues through the HuggingFace model page ## Acknowledgments - **Research Project**: Legal text simplification and accessibility research (2024) - **Training Data**: Public legal documents and regulations - **Base Model**: Meta's Llama-2-7b-Chat-hf --- *This model is part of a research project on legal text simplification and accessibility, focusing on passive-to-active voice transformation in legal documents.*