Llama 3.2 3B - SQL Query Generator (LoRA Fine-tuned)
This model is a fine-tuned version of meta-llama/Llama-3.2-3B for text-to-SQL generation using LoRA (Low-Rank Adaptation) on the Spider dataset.
Model Description
- Base Model: Llama 3.2 3B (3 billion parameters)
- Fine-tuning Method: LoRA (Parameter-Efficient Fine-Tuning)
- Quantization: 4-bit NF4 with double quantization
- Dataset: Spider (7,000 training examples across 200+ databases)
- Training Duration: 3 epochs, ~47 minutes on AWS g5.2xlarge (NVIDIA A10G)
- Final Training Loss: 0.37 (85% reduction from initial 2.5)
- Deployment: Production-ready merged model with LoRA weights integrated
Intended Use
This model converts natural language questions into SQL queries for various database schemas. It's designed for:
- Automated SQL query generation from natural language
- Data analysis assistants and chatbots
- Natural language database interfaces
- Educational tools for learning SQL
- Business intelligence applications
Training Details
Training Configuration
- Learning Rate: 2e-4 with cosine scheduler
- Batch Size: 4 per device
- Gradient Accumulation Steps: 4 (effective batch size: 16)
- Training Epochs: 3
- Max Sequence Length: 2048 tokens
- Optimizer: AdamW with 8-bit quantization
- LoRA Configuration:
- Rank (r): 16
- Alpha: 32
- Dropout: 0.05
- Target Modules: q_proj, k_proj, v_proj, o_proj
Training Results
| Metric | Value |
|---|---|
| Initial Loss | 2.50 |
| Final Loss | 0.37 |
| Loss Reduction | 85% |
| Trainable Parameters | 9.17M (0.51% of 1.8B total) |
| Training Time | 47 minutes |
| GPU | NVIDIA A10G (24GB VRAM) |
Training Infrastructure
- Cloud Provider: AWS EC2 g5.2xlarge
- GPU: NVIDIA A10G (24GB VRAM)
- Framework: PyTorch 2.0+
- Training Library: HuggingFace Transformers + PEFT
- Quantization: BitsAndBytes (4-bit NF4)
Usage
Installation
pip install transformers torch accelerate
Basic Inference
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Load model and tokenizer
model_name = "Abhisek987/llama-3.2-sql-merged"
model = AutoModelForCausalLM.from_pretrained(
model_name,
device_map="auto",
torch_dtype=torch.float16
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Prepare prompt
database = "employees"
question = "What are the names of all employees who earn more than 50000?"
prompt = f"""### Instruction:
You are a SQL expert. Generate a SQL query to answer the given question for the specified database.
### Input:
Database: {database}
Question: {question}
### Response:
"""
# Generate SQL
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=256,
temperature=0.1,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
result = tokenizer.decode(outputs[0], skip_special_tokens=True)
sql_query = result.split("### Response:")[-1].strip()
print(sql_query)
Output:
SELECT name FROM employees WHERE salary > 50000;
Batch Processing
def generate_sql_batch(questions_with_db):
"""Generate SQL for multiple questions"""
results = []
for database, question in questions_with_db:
prompt = f"""### Instruction:
You are a SQL expert. Generate a SQL query.
### Input:
Database: {database}
Question: {question}
### Response:
"""
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.1)
result = tokenizer.decode(outputs[0], skip_special_tokens=True)
sql = result.split("### Response:")[-1].strip()
results.append(sql)
return results
# Example usage
queries = [
("employees", "Show all employees"),
("sales", "Top 5 products by revenue"),
("customers", "Count by country")
]
sql_queries = generate_sql_batch(queries)
Example Queries
| Database | Question | Generated SQL |
|---|---|---|
| employees | "Show all employees with salary above 60000" | SELECT name FROM employees WHERE salary > 60000; |
| sales | "Show me the top 5 products by total sales" | SELECT product_id, sum(sales) FROM sales GROUP BY product_id ORDER BY sum(sales) DESC LIMIT 5; |
| customers | "How many customers are from each country?" | SELECT count(*), country FROM customers GROUP BY country; |
| orders | "Find all orders placed in the last 30 days" | SELECT order_id FROM orders WHERE date_order_placed BETWEEN DATE('now') - INTERVAL 30 DAY AND DATE('now') - INTERVAL 1 DAY; |
| concert_singer | "What are the names of all singers ordered by net worth?" | SELECT Name FROM singer ORDER BY Net_Worth; |
Live Demo
Try the model interactively: 🔗 Interactive Demo on HuggingFace Spaces
The demo includes:
- Web-based interface
- Pre-loaded example queries
- REST API access for integration
Performance Characteristics
Strengths
- ✅ Excellent performance on common SQL patterns (SELECT, WHERE, JOIN, GROUP BY)
- ✅ Handles aggregations and sorting correctly
- ✅ Good understanding of database schema context
- ✅ Generates syntactically valid SQL in most cases
Limitations
- ⚠️ Trained specifically on Spider dataset schemas
- ⚠️ May not generalize perfectly to significantly different database structures
- ⚠️ Requires proper database schema context for best results
- ⚠️ Complex nested queries may need refinement
- ⚠️ 4-bit quantization may occasionally affect numerical precision
Best Practices
- Provide clear database names
- Use specific, unambiguous questions
- Include relevant schema information when possible
- Review generated queries before execution
- Test on sample data first
Technical Architecture
Model Components
- Base LLM: Llama 3.2 3B (decoder-only transformer)
- Vocabulary Size: 128,256 tokens
- Context Window: 128k tokens (trained on 2048)
- Quantization: 4-bit NF4 (reduces from 12GB to ~1.5GB)
- LoRA Layers: Applied to attention projection layers
Inference Performance
- GPU (A10G): ~1-2 seconds per query
- CPU (16 cores): ~10-30 seconds per query
- Memory Requirements:
- GPU: 4-6GB VRAM
- CPU: 8-12GB RAM
Training Dataset
Spider Dataset Statistics:
- Training Examples: 7,000 text-to-SQL pairs
- Databases: 200+ different schemas
- Domains: Academic, business, social, etc.
- Complexity: Simple to complex multi-table joins
- Question Types: Aggregation, filtering, sorting, grouping
Evaluation
Model was evaluated on the Spider development set with focus on:
- Exact match accuracy
- Execution accuracy
- Syntactic correctness
- Semantic equivalence
Training loss progression demonstrated strong convergence:
- Epoch 1: 0.65
- Epoch 2: 0.50
- Epoch 3: 0.37
Citation
If you use this model in your work, please cite:
@misc{llama32-sql-merged,
author = {Abhisek Behera},
title = {Llama 3.2 3B SQL Query Generator - LoRA Fine-tuned},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/Abhisek987/llama-3.2-sql-merged}},
note = {Fine-tuned on Spider dataset using LoRA}
}
Original Base Model:
@article{llama3,
title={Llama 3 Model Card},
author={Meta AI},
year={2024},
url={https://github.com/meta-llama/llama3}
}
Spider Dataset:
@inproceedings{yu2018spider,
title={Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task},
author={Yu, Tao and Zhang, Rui and Yang, Kai and others},
booktitle={EMNLP},
year={2018}
}
Acknowledgments
- Meta AI for the Llama 3.2 base model
- Spider Dataset creators for the training data
- HuggingFace for the Transformers and PEFT libraries
- AWS for cloud infrastructure
License
This model inherits the Llama 3.2 Community License from the base model. Please review the license terms before commercial use.
Contact & Contributions
- Author: Abhisek Behera
- HuggingFace: @Abhisek987
- Demo: SQL Generator Space
For issues, improvements, or questions, please use the Community tab.
Model Card Authors
- Abhisek Behera (@Abhisek987)
- Downloads last month
- 28
Model tree for Abhisek987/llama-3.2-sql-merged
Base model
meta-llama/Llama-3.2-3B