dranreb1660
/

medimaven-llama3-8b-fp16

Text Generation

text-generation-inference

Model card Files Files and versions

dranreb1660 commited on Jul 8

Commit

18fb67b

·

1 Parent(s): aa8f1bc

Added model card readme

Files changed (1) hide show

README.md +90 -0

README.md ADDED Viewed

	@@ -0,0 +1,90 @@

+---
+license: apache-2.0
+language: en
+library_name: transformers
+pipeline_tag: text-generation
+tags:
+  - llama3
+  - medical
+  - rag
+  - finetuned
+datasets:
+  - medquad
+  - icliniq
+  - NHS
+  - WEBMD
+  - NIH
+model_creator: Bernard Kyei-Mensah
+base_model: meta-llama/Meta-Llama-3-8B
+inference: true
+---
+# 🩺 MediMaven Llama-3 8 B (fp16, v1.1)
+**A domain-adapted Llama-3 fine-tuned on ~150 k high-quality Q&A pairs, merged to full-precision fp16 weights for maximum downstream flexibility.**
+---
+# ✨ Key points
+|  |  |
+|---|---|
+|**Base model**|Meta-Llama-3-8B|
+|**Tuning method**|QLoRA (8-bit) → merge to fp16|
+|**Training data**|Curated MedQuAD v2, scrapped articles from Mayo Clinic, NIH, NHS and WEBMD |
+|**Intended use**|Medical information retrieval, summarisation, chat|
+> **Disclaimer** Outputs are *informational* and do **not** constitute medical advice.
+---
+# 🔥 Quick start
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+tok = AutoTokenizer.from_pretrained("dranreb1660/medimaven-llama3-8b-fp16")
+model = AutoModelForCausalLM.from_pretrained(
+    "dranreb1660/medimaven-llama3-8b-fp16",
+    torch_dtype="float16",
+    device_map="auto"
+)
+prompt = "Explain first-line treatment for GERD in two sentences."
+print(tok.decode(model.generate(**tok(prompt, return_tensors="pt").to(model.device),
+                                 max_new_tokens=64)[0],
+                 skip_special_tokens=True))
+```
+---
+# 📊 Evaluation
+| Metric                      | Clean Llama-3 8 B | **MediMaven** |
+| --------------------------- | ----------------- | ------------- |
+| Medical MC-QA (exact-match) | 78.4              | **89.7**      |
+| F1 (MedQA-RAG) †            | 0.71              | **0.83**      |
+# 🛠️ How we trained
+- Built dataset with de-duplicated, source-attributed passages (MedQuAD, Mayo, iCliniq) [check dataset card for more info](https://huggingface.co/datasets/dranreb1660/medimaven-qa-data).
+- Applied QLoRA (32 → 4 bit) on NVIDIA T4, 3-epoch, LR 3e-5, cosine schedule.
+- Merged LoRA adapters to fp16; ran AWQ (see separate repo) for prod inference.
+[Full training notebook](/training/notebooks/llama3_finetune.ipynb)
+# 🚦 Limitations & bias
+* Llama-3 license prohibits use in regulated "high-risk" settings.
+* English-only; no guarantee of safe output in other languages.
+# ⬆️ Versioning
+* v1.1 = first public release (merged weights, new tokenizer template).
+* For lighter deployment see medimaven-llama3-8b-awq
+# 📜 Citation
+```bitbox
+@misc{medimaven2025llama3,
+  title        = {MediMaven Llama-3 8 B},
+  author       = {Kyei-Mensah, Bernard},
+  year         = {2025},
+  howpublished = {\url{https://huggingface.co/medimaven-ai/medimaven-llama3-8b-fp16}}
+}