Czech Building Law LoRA Adapter for Mistral-7B
🏗️ A LoRA adapter fine-tuned on Czech Building Law (Stavební zákon) Q&A dataset for question-answering tasks. This adapter enables Mistral-7B-Instruct-v0.3 to provide accurate responses about Czech building regulations, construction permits, and related legal matters.
⚠️ Note: This is an educational project with limitations. Mistral-7B has gaps in Czech language understanding. For production use, a Czech-native model like OpenEuroLLM would be more suitable.
Model Details
Model Description
This LoRA (Low-Rank Adaptation) adapter was fine-tuned on a dataset of 576 Czech Building Law question-answer pairs. It adapts Mistral-7B-Instruct-v0.3 to answer questions about Czech construction regulations, building permits, territorial planning, and related legal matters.
The model was trained as part of an AI Developer course project and is intended for educational and research purposes. While functional, it has limitations due to Mistral's non-native Czech language capabilities.
- Developed by: Mgr. Rostislav Peška
- Project type: Educational (AI Developer Course)
- Funding: Non-profit educational project
- Model type: LoRA Adapter for Causal Language Model
- Language: Czech (cs)
- License: Apache 2.0
- Base model: mistralai/Mistral-7B-Instruct-v0.3
- Dataset: rostislavpeska/stavebni-zakon-dataset
Model Sources
- Repository: OpenEuroLLMDataset (if available)
- Base Model: mistralai/Mistral-7B-Instruct-v0.3
- Training Dataset: rostislavpeska/stavebni-zakon-dataset
- Contact: [email protected]
Uses
Direct Use
This adapter is designed for:
- Answering questions about Czech building law (Stavební zákon)
- Providing information on construction permits and procedures
- Explaining territorial planning regulations
- Educational purposes for learning Czech legal terminology
- Research on legal domain adaptation for LLMs
Ideal users:
- Students learning about Czech building regulations
- Developers creating chatbots for construction-related queries
- Researchers studying legal NLP in Czech language
Downstream Use
This adapter can be integrated into:
- Legal chatbots for construction companies
- Educational platforms teaching Czech building law
- Document analysis tools for building permits
- Q&A systems for architectural firms
Note: For production systems, further fine-tuning or using a Czech-native base model (e.g., OpenEuroLLM) is recommended.
Out-of-Scope Use
❌ NOT suitable for:
- Legal advice - This is NOT a replacement for professional legal counsel
- Official legal documents - Responses may contain inaccuracies
- Critical decision-making - Always verify with official sources and legal experts
- Production systems without review - Requires human oversight
- Non-Czech building law - Trained specifically on Czech regulations
- Real-time legal changes - May not reflect the latest amendments
⚠️ Always consult licensed legal professionals for official guidance.
Bias, Risks, and Limitations
Technical Limitations
- Base model gaps: Mistral-7B is not optimized for Czech language, leading to potential grammatical errors or unnatural phrasing
- Dataset size: Only 576 training samples - limited coverage of all building law scenarios
- Domain specificity: Trained only on Czech building law (Stavební zákon)
- Temporal limitations: Training data may not reflect the most recent legal amendments
- LoRA constraints: Adapter size limits the model's capacity to learn complex legal reasoning
Recommended Alternative
OpenEuroLLM would be a superior base model for this task due to native Czech language support. If you have access to OpenEuroLLM and would like to collaborate on improving this project, please reach out!
Bias Considerations
- Responses reflect the training dataset's interpretation of building law
- May contain biases present in the original Q&A dataset
- Legal language complexity may not be fully captured
Safety Risks
- Hallucination: Model may generate plausible but incorrect legal information
- Oversimplification: Complex legal matters may be oversimplified
- Misinterpretation: Users may misinterpret responses as official legal advice
Recommendations
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
How to Get Started with the Model
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
# Load base model with 4-bit quantization (for GPU efficiency)
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
)
base_model = AutoModelForCausalLM.from_pretrained(
"mistralai/Mistral-7B-Instruct-v0.3",
quantization_config=bnb_config,
device_map="auto",
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
"mistralai/Mistral-7B-Instruct-v0.3",
trust_remote_code=True
)
# Load LoRA adapter
model = PeftModel.from_pretrained(
base_model,
"rostislavpeska/mistral-czech-building-law-lora",
torch_dtype=torch.bfloat16,
)
model.eval()
# Ask a question
test_messages = [
{"role": "user", "content": "Kdy potřebuji stavební povolení?"}
]
inputs = tokenizer.apply_chat_template(
test_messages,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)
with torch.no_grad():
outputs = model.generate(
inputs,
max_new_tokens=256,
do_sample=True,
temperature=0.7,
top_p=0.9,
pad_token_id=tokenizer.eos_token_id,
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Training Details
Training Data
Dataset: rostislavpeska/stavebni-zakon-dataset
- Total samples: 576 Q&A pairs
- Language: Czech
- Domain: Czech Building Law (Stavební zákon)
- Format: Conversational Q&A pairs in chat template format
- Split: 80/20 train/test (460 training, 116 testing)
- Token length: 64-1731 tokens per sample (avg: 271.9 tokens)
The dataset contains questions and answers about:
- Building permits (stavební povolení)
- Territorial planning (územní plánování)
- Construction regulations (stavební předpisy)
- Legal procedures and requirements
Training Procedure
Preprocessing [optional]
[More Information Needed]
Training Hyperparameters
- Training regime: bf16 mixed precision (bfloat16)
- Quantization: 4-bit NF4 with double quantization (QLoRA)
- Optimizer: paged_adamw_8bit
- Learning rate: 2e-4
- Learning rate scheduler: cosine
- Batch size: 2 per device
- Gradient accumulation steps: 4 (effective batch size: 8)
- Number of epochs: 3
- LoRA rank (r): 64
- LoRA alpha: 32
- LoRA dropout: 0.05
- LoRA target modules: all-linear
- Warmup steps: 50
- Gradient checkpointing: Enabled
- Max sequence length: 2048 tokens
Speeds, Sizes, Times
- Training time: ~12.4 minutes (0.21 hours)
- Training date: November 2, 2025
- Adapter size: ~336 MB (safetensors format)
- Trainable parameters: 167,772,160 (2.26% of base model)
- Hardware: NVIDIA GeForce RTX 4070 Ti SUPER (16GB VRAM)
- Peak VRAM usage: ~12-14 GB
- Training framework: PyTorch with Hugging Face Transformers, PEFT, and TRL
Evaluation
Testing Data, Factors & Metrics
Testing Data
Test split: 116 samples (20% of total dataset)
From the same dataset: rostislavpeska/stavebni-zakon-dataset
Factors
Evaluation focuses on:
- Domain accuracy: Correctness of legal information
- Language quality: Czech grammar and fluency
- Relevance: Appropriateness of responses to questions
- Citation: Proper references to legal codes and regulations
Metrics
- Evaluation loss: Primary metric during training
- Qualitative assessment: Manual review of response quality
- Domain expert review recommended: Legal professionals should validate outputs
Results
The model successfully generates contextually relevant responses to Czech building law questions. However, as this is an educational project with a limited dataset and non-native base model, comprehensive quantitative evaluation has not been performed.
Observed strengths:
- Maintains conversational context
- References relevant legal codes
- Provides structured responses
Observed limitations:
- Occasional grammatical imperfections due to base model's Czech limitations
- May oversimplify complex legal scenarios
- Limited by training data coverage
Summary
Model Examination
No formal interpretability analysis has been conducted. This is an educational project with limited scope.
Future work could include:
- Attention weight visualization for legal reasoning
- Error analysis on failure cases
- Comparison with Czech-native base models (e.g., OpenEuroLLM)
Environmental Impact
Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).
- Hardware Type: NVIDIA GeForce RTX 4070 Ti SUPER (16GB)
- Hours used: ~0.21 hours (12.4 minutes)
- Cloud Provider: Local machine (not cloud)
- Compute Region: Czech Republic / Central Europe
- Carbon Emitted: Minimal due to short training time and local infrastructure
Technical Specifications [optional]
Model Architecture and Objective
Base Model: Mistral-7B-Instruct-v0.3
- Architecture: Transformer decoder with Sliding Window Attention
- Parameters: ~7.2 billion
- Context length: 32k tokens
Adaptation Method: QLoRA (Quantized Low-Rank Adaptation)
- Trainable parameters: ~168 million (2.26% of base model)
- Quantization: 4-bit NF4 with double quantization
- LoRA rank: 64
- Target modules: All linear layers
Objective: Supervised fine-tuning for Czech building law question-answering
Compute Infrastructure
Hardware
- GPU: NVIDIA GeForce RTX 4070 Ti SUPER
- VRAM: 16GB GDDR6X
- RAM: 32GB
- CPU: 24 Logical Processors
- Storage: Local SSD
- Location: Local workstation (not cloud)
Software
- OS: Windows
- Python: 3.12.10
- PyTorch: 2.7.1+cu118
- Transformers: ≥4.40.0
- PEFT: ≥0.10.0
- BitsAndBytes: ≥0.43.0
- TRL: ≥0.8.0
- Accelerate: ≥0.28.0
- CUDA: 11.8
- Training framework: Jupyter Notebook with SFTTrainer
Citation
If you use this model, please cite:
BibTeX:
@misc{peska2025czechbuildinglaw,
author = {Peška, Rostislav},
title = {Czech Building Law LoRA Adapter for Mistral-7B},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/rostislavpeska/mistral-czech-building-law-lora}},
note = {Educational AI Developer Course Project}
}
APA:
Peška, R. (2025). Czech Building Law LoRA Adapter for Mistral-7B [LoRA adapter]. HuggingFace. https://huggingface.co/rostislavpeska/mistral-czech-building-law-lora
Glossary
- LoRA (Low-Rank Adaptation): Efficient fine-tuning method that trains small adapter modules
- QLoRA: LoRA with 4-bit quantization for reduced memory usage
- Stavební zákon: Czech Building Law
- Stavební povolení: Building permit
- Územní plánování: Territorial/spatial planning
- PEFT: Parameter-Efficient Fine-Tuning
- BitsAndBytes: Library for efficient quantization
- SFTTrainer: Supervised Fine-Tuning Trainer from TRL library
More Information
Project Context
This adapter was developed as part of an AI Developer course project to demonstrate:
- Fine-tuning LLMs for specialized domains
- Efficient training with limited resources (QLoRA)
- Working with Czech language legal data
- Practical deployment to HuggingFace Hub
Limitations & Future Work
Known issues:
- Mistral-7B has gaps in Czech language understanding
- Limited dataset size (576 samples)
- May not reflect the latest legal amendments
Improvements wanted:
- OpenEuroLLM base model for better Czech language support
- Expanded dataset with more scenarios
- Multi-turn conversation capabilities
- Integration with official legal databases
Collaboration Welcome!
If you have access to OpenEuroLLM or expertise in Czech legal NLP, I'd love to collaborate on improving this project. This is a non-profit educational initiative, and contributions are welcome!
Model Card Authors
Mgr. Rostislav Peška
- Email: [email protected]
- Phone: +420 754 506 863
- Role: Developer & Trainer (AI Developer Course Project)
Model Card Contact
For questions, collaborations, or feedback:
- Name: Mgr. Rostislav Peška
- Email: [email protected]
- Phone: +420 754 506 863
Areas of interest:
- Collaboration on OpenEuroLLM-based version
- Czech legal NLP research
- 3D Model generation
- Deployment and integration support
This is an educational project. Always consult licensed legal professionals for official building law guidance.
- Downloads last month
- 43
Model tree for rostislavpeska/mistral-czech-building-law-lora
Base model
mistralai/Mistral-7B-v0.3