File size: 2,741 Bytes
18fb67b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
acfdf0d
18fb67b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
acfdf0d
18fb67b
 
acfdf0d
18fb67b
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
---
license: apache-2.0
language: en
library_name: transformers
pipeline_tag: text-generation
tags:
  - llama3
  - medical
  - rag
  - finetuned
datasets:
  - medquad
  - icliniq
  - NHS
  - WEBMD
  - NIH
model_creator: Bernard Kyei-Mensah
base_model: meta-llama/Meta-Llama-3-8B
inference: true
---

# 🩺 MediMaven Llama-3.1-8B (fp16, v1.1)

**A domain-adapted Llama-3 fine-tuned on ~150 k high-quality Q&A pairs, merged to full-precision fp16 weights for maximum downstream flexibility.**

---

# ✨ Key points
|  |  |
|---|---|
|**Base model**|Meta-Llama-3-8B|
|**Tuning method**|QLoRA (8-bit) → merge to fp16|
|**Training data**|Curated MedQuAD v2, scrapped articles from Mayo Clinic, NIH, NHS and WEBMD |
|**Intended use**|Medical information retrieval, summarisation, chat|

> **Disclaimer** Outputs are *informational* and do **not** constitute medical advice.

---

# 🔥 Quick start

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
tok = AutoTokenizer.from_pretrained("dranreb1660/medimaven-llama3-8b-fp16")
model = AutoModelForCausalLM.from_pretrained(
    "dranreb1660/medimaven-llama3-8b-fp16",
    torch_dtype="float16",
    device_map="auto"
)
prompt = "Explain first-line treatment for GERD in two sentences."
print(tok.decode(model.generate(**tok(prompt, return_tensors="pt").to(model.device),
                                 max_new_tokens=64)[0],
                 skip_special_tokens=True))
```
---
# 📊 Evaluation
| Metric                      | Clean Llama-3 8 B | **MediMaven** |
| --------------------------- | ----------------- | ------------- |
| Medical MC-QA (exact-match) | 78.4              | **89.7**      |
| F1 (MedQA-RAG) †            | 0.71              | **0.83**      |


# 🛠️ How we trained
- Built dataset with de-duplicated, source-attributed passages (MedQuAD, Mayo, iCliniq) [check dataset card for more info](https://huggingface.co/datasets/dranreb1660/medimaven-qa-data).

- Applied QLoRA (32 → 4 bit) on NVIDIA T4, 3-epoch, LR 3e-5, cosine schedule.

- Merged LoRA adapters to fp16; ran AWQ (see separate repo) for prod inference.

[Full training notebook](/training/notebooks/llama3_finetune.ipynb)

# 🚦 Limitations & bias
* Llama-3 license prohibits use in regulated "high-risk" settings.

* English-only; no guarantee of safe output in other languages.


# ⬆️ Versioning
* v1.1 = first public release (merged weights, new tokenizer template).
* For lighter deployment see medimaven-llama3-8b-awq


# 📜 Citation
```bitbox
@misc{medimaven2025llama3,
  title        = {MediMaven Llama-3.1-8B},
  author       = {Kyei-Mensah, Bernard},
  year         = {2025},
  howpublished = {\url{https://huggingface.co/dranreb1660/medimaven-llama3-8b-fp16}}
}