Llama 3.2 3B - SQL Query Generator (LoRA Fine-tuned)

This model is a fine-tuned version of meta-llama/Llama-3.2-3B for text-to-SQL generation using LoRA (Low-Rank Adaptation) on the Spider dataset.

Model Description

Base Model: Llama 3.2 3B (3 billion parameters)
Fine-tuning Method: LoRA (Parameter-Efficient Fine-Tuning)
Quantization: 4-bit NF4 with double quantization
Dataset: Spider (7,000 training examples across 200+ databases)
Training Duration: 3 epochs, ~47 minutes on AWS g5.2xlarge (NVIDIA A10G)
Final Training Loss: 0.37 (85% reduction from initial 2.5)
Deployment: Production-ready merged model with LoRA weights integrated

Intended Use

This model converts natural language questions into SQL queries for various database schemas. It's designed for:

Automated SQL query generation from natural language
Data analysis assistants and chatbots
Natural language database interfaces
Educational tools for learning SQL
Business intelligence applications

Training Details

Training Configuration

Learning Rate: 2e-4 with cosine scheduler
Batch Size: 4 per device
Gradient Accumulation Steps: 4 (effective batch size: 16)
Training Epochs: 3
Max Sequence Length: 2048 tokens
Optimizer: AdamW with 8-bit quantization
LoRA Configuration:
- Rank (r): 16
- Alpha: 32
- Dropout: 0.05
- Target Modules: q_proj, k_proj, v_proj, o_proj

Training Results

Metric	Value
Initial Loss	2.50
Final Loss	0.37
Loss Reduction	85%
Trainable Parameters	9.17M (0.51% of 1.8B total)
Training Time	47 minutes
GPU	NVIDIA A10G (24GB VRAM)

Training Infrastructure

Cloud Provider: AWS EC2 g5.2xlarge
GPU: NVIDIA A10G (24GB VRAM)
Framework: PyTorch 2.0+
Training Library: HuggingFace Transformers + PEFT
Quantization: BitsAndBytes (4-bit NF4)

Usage

Installation

pip install transformers torch accelerate

Basic Inference

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load model and tokenizer
model_name = "Abhisek987/llama-3.2-sql-merged"
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    torch_dtype=torch.float16
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Prepare prompt
database = "employees"
question = "What are the names of all employees who earn more than 50000?"

prompt = f"""### Instruction:
You are a SQL expert. Generate a SQL query to answer the given question for the specified database.

### Input:
Database: {database}
Question: {question}

### Response:
"""

# Generate SQL
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=256,
    temperature=0.1,
    do_sample=True,
    pad_token_id=tokenizer.eos_token_id
)

result = tokenizer.decode(outputs[0], skip_special_tokens=True)
sql_query = result.split("### Response:")[-1].strip()
print(sql_query)

Output:

SELECT name FROM employees WHERE salary > 50000;

Batch Processing

def generate_sql_batch(questions_with_db):
    """Generate SQL for multiple questions"""
    results = []
    
    for database, question in questions_with_db:
        prompt = f"""### Instruction:
You are a SQL expert. Generate a SQL query.

### Input:
Database: {database}
Question: {question}

### Response:
"""
        inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
        outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.1)
        result = tokenizer.decode(outputs[0], skip_special_tokens=True)
        sql = result.split("### Response:")[-1].strip()
        results.append(sql)
    
    return results

# Example usage
queries = [
    ("employees", "Show all employees"),
    ("sales", "Top 5 products by revenue"),
    ("customers", "Count by country")
]

sql_queries = generate_sql_batch(queries)

Example Queries

Database	Question	Generated SQL
employees	"Show all employees with salary above 60000"	`SELECT name FROM employees WHERE salary > 60000;`
sales	"Show me the top 5 products by total sales"	`SELECT product_id, sum(sales) FROM sales GROUP BY product_id ORDER BY sum(sales) DESC LIMIT 5;`
customers	"How many customers are from each country?"	`SELECT count(*), country FROM customers GROUP BY country;`
orders	"Find all orders placed in the last 30 days"	`SELECT order_id FROM orders WHERE date_order_placed BETWEEN DATE('now') - INTERVAL 30 DAY AND DATE('now') - INTERVAL 1 DAY;`
concert_singer	"What are the names of all singers ordered by net worth?"	`SELECT Name FROM singer ORDER BY Net_Worth;`

Live Demo

Try the model interactively: 🔗 Interactive Demo on HuggingFace Spaces

The demo includes:

Web-based interface
Pre-loaded example queries
REST API access for integration

Performance Characteristics

Strengths

✅ Excellent performance on common SQL patterns (SELECT, WHERE, JOIN, GROUP BY)
✅ Handles aggregations and sorting correctly
✅ Good understanding of database schema context
✅ Generates syntactically valid SQL in most cases

Limitations

⚠️ Trained specifically on Spider dataset schemas
⚠️ May not generalize perfectly to significantly different database structures
⚠️ Requires proper database schema context for best results
⚠️ Complex nested queries may need refinement
⚠️ 4-bit quantization may occasionally affect numerical precision

Best Practices

Provide clear database names
Use specific, unambiguous questions
Include relevant schema information when possible
Review generated queries before execution
Test on sample data first

Technical Architecture

Model Components

Base LLM: Llama 3.2 3B (decoder-only transformer)
Vocabulary Size: 128,256 tokens
Context Window: 128k tokens (trained on 2048)
Quantization: 4-bit NF4 (reduces from 12GB to ~1.5GB)
LoRA Layers: Applied to attention projection layers

Inference Performance

GPU (A10G): ~1-2 seconds per query
CPU (16 cores): ~10-30 seconds per query
Memory Requirements:
- GPU: 4-6GB VRAM
- CPU: 8-12GB RAM

Training Dataset

Spider Dataset Statistics:

Training Examples: 7,000 text-to-SQL pairs
Databases: 200+ different schemas
Domains: Academic, business, social, etc.
Complexity: Simple to complex multi-table joins
Question Types: Aggregation, filtering, sorting, grouping

Evaluation

Model was evaluated on the Spider development set with focus on:

Exact match accuracy
Execution accuracy
Syntactic correctness
Semantic equivalence

Training loss progression demonstrated strong convergence:

Epoch 1: 0.65
Epoch 2: 0.50
Epoch 3: 0.37

Citation

If you use this model in your work, please cite:

@misc{llama32-sql-merged,
  author = {Abhisek Behera},
  title = {Llama 3.2 3B SQL Query Generator - LoRA Fine-tuned},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/Abhisek987/llama-3.2-sql-merged}},
  note = {Fine-tuned on Spider dataset using LoRA}
}

Original Base Model:

@article{llama3,
  title={Llama 3 Model Card},
  author={Meta AI},
  year={2024},
  url={https://github.com/meta-llama/llama3}
}

Spider Dataset:

@inproceedings{yu2018spider,
  title={Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task},
  author={Yu, Tao and Zhang, Rui and Yang, Kai and others},
  booktitle={EMNLP},
  year={2018}
}

Acknowledgments

Meta AI for the Llama 3.2 base model
Spider Dataset creators for the training data
HuggingFace for the Transformers and PEFT libraries
AWS for cloud infrastructure

License

This model inherits the Llama 3.2 Community License from the base model. Please review the license terms before commercial use.

Contact & Contributions

Author: Abhisek Behera
HuggingFace: @Abhisek987
Demo: SQL Generator Space

For issues, improvements, or questions, please use the Community tab.

Model Card Authors

Abhisek Behera (@Abhisek987)

Downloads last month: 28

Safetensors

Model size

3B params

Tensor type

F16

Model tree for Abhisek987/llama-3.2-sql-merged

Base model

meta-llama/Llama-3.2-3B

Adapter

(225)

this model

Abhisek987
/

llama-3.2-sql-merged