ποΈ Legal / Policy Text Summarizer NLP
An advanced Transformer-based NLP model that simplifies legal, governmental, and policy documents into three easy-to-understand summary formats:
πΉ 3-line summary
πΉ 1-paragraph summary
πΉ Bullet points (3β7 bullets)
This project includes the full ML pipeline: preprocessing, PDF extraction, dataset creation, training, evaluation, inference, FastAPI deployment, Gradio UI, tests, and a HuggingFace model card.
π Features
β Summarizes long policies, laws, govt documents
β Output styles: 3line, paragraph, bullets
β Full training/evaluation pipeline
β Works with PDFs
β Built on google/flan-t5-base
β Apache 2.0 licensed
β HuggingFace-ready metadata
π Project Structure
legal-policy-summarizer-nlp/
βββ data/
βββ src/
βββ tests/
βββ app/
βββ notebooks/
βββ huggingface/
βββ model/
βββ README.md
βββ LICENSE
βββ requirements.txt
π¦ Installation
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
π Dataset Preprocessing
python -m src.dataset_preprocessing --input data/raw/dataset.csv --output data/processed/dataset_clean.jsonl
ποΈ Training
python -m src.train
π§ͺ Evaluation
python -m src.evaluate
π€ Inference
from src.inference import summarize
print(summarize("policy text...", mode="paragraph"))
π API (FastAPI)
uvicorn app.api:app --reload --port 8000
π¨ Gradio UI
python app/ui.py
Model tree for hmnshudhmn24/legal-policy-summarizer-nlp
Base model
google/flan-t5-base