---
library_name: gguf
base_model: Qwen/Qwen3-8B
quantized_by: Tohirju
model_name: Ameena_Qwen3-8B_e3_Quantised_gguf
model_author: Tohirju
model_type: qwen3
quantization_method: Q4_K_M
tags:
- quantized
- gguf
- qwen3
- 8b
- q4_k_m
license: apache-2.0
---

# Ameena Qwen3-8B e3 Quantized GGUF

This is a quantized version of a fine-tuned Qwen3-8B model, optimized for efficient inference.

## Model Details

- **Base Model**: Qwen/Qwen3-8B
- **Quantization**: Q4_K_M (4-bit with K-quant mixed precision)
- **Original Size**: ~15.26 GB
- **Quantized Size**: ~4.68 GB  
- **Compression Ratio**: 3.3x
- **Format**: GGUF (GPT-Generated Unified Format)

## Usage

### With llama-cpp-python

```python
from llama_cpp import Llama

# Load the model
llm = Llama(
    model_path="Ameena_Qwen3-8B_e3.gguf",
    n_gpu_layers=-1,  # Use GPU acceleration
    n_ctx=4096,       # Context window
    verbose=False
)

# Generate text
response = llm(
    "Your prompt here",
    max_tokens=512,
    temperature=0.7,
    top_p=0.9
)
```

### With Hugging Face Transformers + llama.cpp

```python
# Download the model
from huggingface_hub import hf_hub_download

model_path = hf_hub_download(
    repo_id="Tohirju/Ameena_Qwen3-8B_e3_Quantised_gguf",
    filename="Ameena_Qwen3-8B_e3.gguf"
)
```

## Quantization Details

- **Method**: Q4_K_M - Mixed precision 4-bit quantization
- **Quality**: Excellent balance between model size and performance
- **Speed**: Optimized for fast inference on both CPU and GPU
- **Memory**: Significantly reduced VRAM requirements

## Performance

- **Inference Speed**: ~3.3x faster loading due to smaller file size
- **Memory Usage**: ~69% reduction in memory requirements
- **Quality**: Minimal quality loss compared to FP16 version

## Hardware Requirements

- **CPU**: Any modern CPU (optimized for x86_64)
- **GPU**: CUDA-compatible GPU recommended (RTX 3060+ or better)
- **RAM**: 8GB minimum, 16GB recommended
- **Storage**: ~5GB for the model file

## License

This model follows the Apache 2.0 license of the base Qwen3-8B model.