File size: 2,741 Bytes
18fb67b acfdf0d 18fb67b acfdf0d 18fb67b acfdf0d 18fb67b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 |
---
license: apache-2.0
language: en
library_name: transformers
pipeline_tag: text-generation
tags:
- llama3
- medical
- rag
- finetuned
datasets:
- medquad
- icliniq
- NHS
- WEBMD
- NIH
model_creator: Bernard Kyei-Mensah
base_model: meta-llama/Meta-Llama-3-8B
inference: true
---
# 🩺 MediMaven Llama-3.1-8B (fp16, v1.1)
**A domain-adapted Llama-3 fine-tuned on ~150 k high-quality Q&A pairs, merged to full-precision fp16 weights for maximum downstream flexibility.**
---
# ✨ Key points
| | |
|---|---|
|**Base model**|Meta-Llama-3-8B|
|**Tuning method**|QLoRA (8-bit) → merge to fp16|
|**Training data**|Curated MedQuAD v2, scrapped articles from Mayo Clinic, NIH, NHS and WEBMD |
|**Intended use**|Medical information retrieval, summarisation, chat|
> **Disclaimer** Outputs are *informational* and do **not** constitute medical advice.
---
# 🔥 Quick start
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
tok = AutoTokenizer.from_pretrained("dranreb1660/medimaven-llama3-8b-fp16")
model = AutoModelForCausalLM.from_pretrained(
"dranreb1660/medimaven-llama3-8b-fp16",
torch_dtype="float16",
device_map="auto"
)
prompt = "Explain first-line treatment for GERD in two sentences."
print(tok.decode(model.generate(**tok(prompt, return_tensors="pt").to(model.device),
max_new_tokens=64)[0],
skip_special_tokens=True))
```
---
# 📊 Evaluation
| Metric | Clean Llama-3 8 B | **MediMaven** |
| --------------------------- | ----------------- | ------------- |
| Medical MC-QA (exact-match) | 78.4 | **89.7** |
| F1 (MedQA-RAG) † | 0.71 | **0.83** |
# 🛠️ How we trained
- Built dataset with de-duplicated, source-attributed passages (MedQuAD, Mayo, iCliniq) [check dataset card for more info](https://huggingface.co/datasets/dranreb1660/medimaven-qa-data).
- Applied QLoRA (32 → 4 bit) on NVIDIA T4, 3-epoch, LR 3e-5, cosine schedule.
- Merged LoRA adapters to fp16; ran AWQ (see separate repo) for prod inference.
[Full training notebook](/training/notebooks/llama3_finetune.ipynb)
# 🚦 Limitations & bias
* Llama-3 license prohibits use in regulated "high-risk" settings.
* English-only; no guarantee of safe output in other languages.
# ⬆️ Versioning
* v1.1 = first public release (merged weights, new tokenizer template).
* For lighter deployment see medimaven-llama3-8b-awq
# 📜 Citation
```bitbox
@misc{medimaven2025llama3,
title = {MediMaven Llama-3.1-8B},
author = {Kyei-Mensah, Bernard},
year = {2025},
howpublished = {\url{https://huggingface.co/dranreb1660/medimaven-llama3-8b-fp16}}
} |