Czech Building Law LoRA Adapter for Mistral-7B

🏗️ A LoRA adapter fine-tuned on Czech Building Law (Stavební zákon) Q&A dataset for question-answering tasks. This adapter enables Mistral-7B-Instruct-v0.3 to provide accurate responses about Czech building regulations, construction permits, and related legal matters.

⚠️ Note: This is an educational project with limitations. Mistral-7B has gaps in Czech language understanding. For production use, a Czech-native model like OpenEuroLLM would be more suitable.

Model Details

Model Description

This LoRA (Low-Rank Adaptation) adapter was fine-tuned on a dataset of 576 Czech Building Law question-answer pairs. It adapts Mistral-7B-Instruct-v0.3 to answer questions about Czech construction regulations, building permits, territorial planning, and related legal matters.

The model was trained as part of an AI Developer course project and is intended for educational and research purposes. While functional, it has limitations due to Mistral's non-native Czech language capabilities.

Developed by: Mgr. Rostislav Peška
Project type: Educational (AI Developer Course)
Funding: Non-profit educational project
Model type: LoRA Adapter for Causal Language Model
Language: Czech (cs)
License: Apache 2.0
Base model: mistralai/Mistral-7B-Instruct-v0.3
Dataset: rostislavpeska/stavebni-zakon-dataset

Model Sources

Repository: OpenEuroLLMDataset (if available)
Base Model: mistralai/Mistral-7B-Instruct-v0.3
Training Dataset: rostislavpeska/stavebni-zakon-dataset
Contact: [email protected]

Uses

Direct Use

This adapter is designed for:

Answering questions about Czech building law (Stavební zákon)
Providing information on construction permits and procedures
Explaining territorial planning regulations
Educational purposes for learning Czech legal terminology
Research on legal domain adaptation for LLMs

Ideal users:

Students learning about Czech building regulations
Developers creating chatbots for construction-related queries
Researchers studying legal NLP in Czech language

Downstream Use

This adapter can be integrated into:

Legal chatbots for construction companies
Educational platforms teaching Czech building law
Document analysis tools for building permits
Q&A systems for architectural firms

Note: For production systems, further fine-tuning or using a Czech-native base model (e.g., OpenEuroLLM) is recommended.

Out-of-Scope Use

❌ NOT suitable for:

Legal advice - This is NOT a replacement for professional legal counsel
Official legal documents - Responses may contain inaccuracies
Critical decision-making - Always verify with official sources and legal experts
Production systems without review - Requires human oversight
Non-Czech building law - Trained specifically on Czech regulations
Real-time legal changes - May not reflect the latest amendments

⚠️ Always consult licensed legal professionals for official guidance.

Bias, Risks, and Limitations

Technical Limitations

Base model gaps: Mistral-7B is not optimized for Czech language, leading to potential grammatical errors or unnatural phrasing
Dataset size: Only 576 training samples - limited coverage of all building law scenarios
Domain specificity: Trained only on Czech building law (Stavební zákon)
Temporal limitations: Training data may not reflect the most recent legal amendments
LoRA constraints: Adapter size limits the model's capacity to learn complex legal reasoning

Recommended Alternative

OpenEuroLLM would be a superior base model for this task due to native Czech language support. If you have access to OpenEuroLLM and would like to collaborate on improving this project, please reach out!

Bias Considerations

Responses reflect the training dataset's interpretation of building law
May contain biases present in the original Q&A dataset
Legal language complexity may not be fully captured

Safety Risks

Hallucination: Model may generate plausible but incorrect legal information
Oversimplification: Complex legal matters may be oversimplified
Misinterpretation: Users may misinterpret responses as official legal advice

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

How to Get Started with the Model

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel

# Load base model with 4-bit quantization (for GPU efficiency)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
)

base_model = AutoModelForCausalLM.from_pretrained(
    "mistralai/Mistral-7B-Instruct-v0.3",
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
)

tokenizer = AutoTokenizer.from_pretrained(
    "mistralai/Mistral-7B-Instruct-v0.3",
    trust_remote_code=True
)

# Load LoRA adapter
model = PeftModel.from_pretrained(
    base_model,
    "rostislavpeska/mistral-czech-building-law-lora",
    torch_dtype=torch.bfloat16,
)
model.eval()

# Ask a question
test_messages = [
    {"role": "user", "content": "Kdy potřebuji stavební povolení?"}
]

inputs = tokenizer.apply_chat_template(
    test_messages,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

with torch.no_grad():
    outputs = model.generate(
        inputs,
        max_new_tokens=256,
        do_sample=True,
        temperature=0.7,
        top_p=0.9,
        pad_token_id=tokenizer.eos_token_id,
    )

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Training Details

Training Data

Dataset: rostislavpeska/stavebni-zakon-dataset

Total samples: 576 Q&A pairs
Language: Czech
Domain: Czech Building Law (Stavební zákon)
Format: Conversational Q&A pairs in chat template format
Split: 80/20 train/test (460 training, 116 testing)
Token length: 64-1731 tokens per sample (avg: 271.9 tokens)

The dataset contains questions and answers about:

Building permits (stavební povolení)
Territorial planning (územní plánování)
Construction regulations (stavební předpisy)
Legal procedures and requirements

Training Procedure

Preprocessing [optional]

[More Information Needed]

Training Hyperparameters

Training regime: bf16 mixed precision (bfloat16)
Quantization: 4-bit NF4 with double quantization (QLoRA)
Optimizer: paged_adamw_8bit
Learning rate: 2e-4
Learning rate scheduler: cosine
Batch size: 2 per device
Gradient accumulation steps: 4 (effective batch size: 8)
Number of epochs: 3
LoRA rank (r): 64
LoRA alpha: 32
LoRA dropout: 0.05
LoRA target modules: all-linear
Warmup steps: 50
Gradient checkpointing: Enabled
Max sequence length: 2048 tokens

Speeds, Sizes, Times

Training time: ~12.4 minutes (0.21 hours)
Training date: November 2, 2025
Adapter size: ~336 MB (safetensors format)
Trainable parameters: 167,772,160 (2.26% of base model)
Hardware: NVIDIA GeForce RTX 4070 Ti SUPER (16GB VRAM)
Peak VRAM usage: ~12-14 GB
Training framework: PyTorch with Hugging Face Transformers, PEFT, and TRL

Evaluation

Testing Data, Factors & Metrics

Testing Data

Test split: 116 samples (20% of total dataset)

From the same dataset: rostislavpeska/stavebni-zakon-dataset

Factors

Evaluation focuses on:

Domain accuracy: Correctness of legal information
Language quality: Czech grammar and fluency
Relevance: Appropriateness of responses to questions
Citation: Proper references to legal codes and regulations

Metrics

Evaluation loss: Primary metric during training
Qualitative assessment: Manual review of response quality
Domain expert review recommended: Legal professionals should validate outputs

Results

The model successfully generates contextually relevant responses to Czech building law questions. However, as this is an educational project with a limited dataset and non-native base model, comprehensive quantitative evaluation has not been performed.

Observed strengths:

Maintains conversational context
References relevant legal codes
Provides structured responses

Observed limitations:

Occasional grammatical imperfections due to base model's Czech limitations
May oversimplify complex legal scenarios
Limited by training data coverage

Summary

Model Examination

No formal interpretability analysis has been conducted. This is an educational project with limited scope.

Future work could include:

Attention weight visualization for legal reasoning
Error analysis on failure cases
Comparison with Czech-native base models (e.g., OpenEuroLLM)

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Hardware Type: NVIDIA GeForce RTX 4070 Ti SUPER (16GB)
Hours used: ~0.21 hours (12.4 minutes)
Cloud Provider: Local machine (not cloud)
Compute Region: Czech Republic / Central Europe
Carbon Emitted: Minimal due to short training time and local infrastructure

Technical Specifications [optional]

Model Architecture and Objective

Base Model: Mistral-7B-Instruct-v0.3

Architecture: Transformer decoder with Sliding Window Attention
Parameters: ~7.2 billion
Context length: 32k tokens

Adaptation Method: QLoRA (Quantized Low-Rank Adaptation)

Trainable parameters: ~168 million (2.26% of base model)
Quantization: 4-bit NF4 with double quantization
LoRA rank: 64
Target modules: All linear layers

Objective: Supervised fine-tuning for Czech building law question-answering

Compute Infrastructure

Hardware

GPU: NVIDIA GeForce RTX 4070 Ti SUPER
VRAM: 16GB GDDR6X
RAM: 32GB
CPU: 24 Logical Processors
Storage: Local SSD
Location: Local workstation (not cloud)

Software

OS: Windows
Python: 3.12.10
PyTorch: 2.7.1+cu118
Transformers: ≥4.40.0
PEFT: ≥0.10.0
BitsAndBytes: ≥0.43.0
TRL: ≥0.8.0
Accelerate: ≥0.28.0
CUDA: 11.8
Training framework: Jupyter Notebook with SFTTrainer

Citation

If you use this model, please cite:

BibTeX:

@misc{peska2025czechbuildinglaw,
  author = {Peška, Rostislav},
  title = {Czech Building Law LoRA Adapter for Mistral-7B},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/rostislavpeska/mistral-czech-building-law-lora}},
  note = {Educational AI Developer Course Project}
}

APA:

Peška, R. (2025). Czech Building Law LoRA Adapter for Mistral-7B [LoRA adapter]. HuggingFace. https://huggingface.co/rostislavpeska/mistral-czech-building-law-lora

Glossary

LoRA (Low-Rank Adaptation): Efficient fine-tuning method that trains small adapter modules
QLoRA: LoRA with 4-bit quantization for reduced memory usage
Stavební zákon: Czech Building Law
Stavební povolení: Building permit
Územní plánování: Territorial/spatial planning
PEFT: Parameter-Efficient Fine-Tuning
BitsAndBytes: Library for efficient quantization
SFTTrainer: Supervised Fine-Tuning Trainer from TRL library

More Information

Project Context

This adapter was developed as part of an AI Developer course project to demonstrate:

Fine-tuning LLMs for specialized domains
Efficient training with limited resources (QLoRA)
Working with Czech language legal data
Practical deployment to HuggingFace Hub

Limitations & Future Work

Known issues:

Mistral-7B has gaps in Czech language understanding
Limited dataset size (576 samples)
May not reflect the latest legal amendments

Improvements wanted:

OpenEuroLLM base model for better Czech language support
Expanded dataset with more scenarios
Multi-turn conversation capabilities
Integration with official legal databases

Collaboration Welcome!

If you have access to OpenEuroLLM or expertise in Czech legal NLP, I'd love to collaborate on improving this project. This is a non-profit educational initiative, and contributions are welcome!

Model Card Authors

Mgr. Rostislav Peška

Email: [email protected]
Phone: +420 754 506 863
Role: Developer & Trainer (AI Developer Course Project)

Model Card Contact

For questions, collaborations, or feedback:

Name: Mgr. Rostislav Peška
Email: [email protected]
Phone: +420 754 506 863

Areas of interest:

Collaboration on OpenEuroLLM-based version
Czech legal NLP research
3D Model generation
Deployment and integration support

This is an educational project. Always consult licensed legal professionals for official building law guidance.

Downloads last month: 43

Model tree for rostislavpeska/mistral-czech-building-law-lora

Base model

mistralai/Mistral-7B-v0.3

Finetuned

mistralai/Mistral-7B-Instruct-v0.3

Adapter

(507)

this model