albyos commited on
Commit
78d12f6
·
verified ·
1 Parent(s): 69d64b3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +329 -93
README.md CHANGED
@@ -6,202 +6,438 @@ tags:
6
  - base_model:adapter:meta-llama/Llama-3.2-3B-Instruct
7
  - lora
8
  - transformers
 
 
 
 
 
 
 
 
 
 
 
 
9
  ---
10
 
11
- # Model Card for Model ID
12
-
13
- <!-- Provide a quick summary of what the model is/does. -->
14
-
15
 
 
16
 
17
  ## Model Details
18
 
19
  ### Model Description
20
 
21
- <!-- Provide a longer summary of what this model is. -->
22
-
23
-
 
24
 
25
- - **Developed by:** [More Information Needed]
26
- - **Funded by [optional]:** [More Information Needed]
27
- - **Shared by [optional]:** [More Information Needed]
28
- - **Model type:** [More Information Needed]
29
- - **Language(s) (NLP):** [More Information Needed]
30
- - **License:** [More Information Needed]
31
- - **Finetuned from model [optional]:** [More Information Needed]
32
 
33
- ### Model Sources [optional]
 
 
 
 
34
 
35
- <!-- Provide the basic links for the model. -->
36
 
37
- - **Repository:** [More Information Needed]
38
- - **Paper [optional]:** [More Information Needed]
39
- - **Demo [optional]:** [More Information Needed]
40
 
41
  ## Uses
42
 
43
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
44
-
45
  ### Direct Use
46
 
47
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
 
 
 
 
 
 
 
 
 
48
 
49
- [More Information Needed]
50
 
51
- ### Downstream Use [optional]
 
52
 
53
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
 
 
54
 
55
- [More Information Needed]
 
 
 
 
 
 
 
56
 
57
  ### Out-of-Scope Use
58
 
59
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
 
 
 
 
 
60
 
61
- [More Information Needed]
62
 
63
  ## Bias, Risks, and Limitations
64
 
65
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
66
 
67
- [More Information Needed]
 
 
 
 
68
 
69
  ### Recommendations
70
 
71
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
72
-
73
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
 
 
74
 
75
  ## How to Get Started with the Model
76
 
77
- Use the code below to get started with the model.
78
-
79
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
80
 
81
  ## Training Details
82
 
83
  ### Training Data
84
 
85
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
86
-
87
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
88
 
89
  ### Training Procedure
90
 
91
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
92
-
93
- #### Preprocessing [optional]
94
-
95
- [More Information Needed]
96
 
 
 
 
 
 
97
 
98
  #### Training Hyperparameters
99
 
100
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
101
-
102
- #### Speeds, Sizes, Times [optional]
103
-
104
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
105
-
106
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
107
 
108
  ## Evaluation
109
 
110
- <!-- This section describes the evaluation protocols and provides the results. -->
111
-
112
  ### Testing Data, Factors & Metrics
113
 
114
  #### Testing Data
115
 
116
- <!-- This should link to a Dataset Card if possible. -->
117
-
118
- [More Information Needed]
 
119
 
120
  #### Factors
121
 
122
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
123
-
124
- [More Information Needed]
 
125
 
126
  #### Metrics
127
 
128
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
 
 
 
129
 
130
- [More Information Needed]
 
 
 
 
131
 
132
  ### Results
133
 
134
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
135
 
136
- #### Summary
137
 
 
138
 
 
139
 
140
- ## Model Examination [optional]
 
 
 
 
141
 
142
- <!-- Relevant interpretability work for the model goes here -->
143
 
144
- [More Information Needed]
 
 
 
 
145
 
146
  ## Environmental Impact
147
 
148
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
149
 
150
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
 
 
 
 
151
 
152
- - **Hardware Type:** [More Information Needed]
153
- - **Hours used:** [More Information Needed]
154
- - **Cloud Provider:** [More Information Needed]
155
- - **Compute Region:** [More Information Needed]
156
- - **Carbon Emitted:** [More Information Needed]
157
 
158
- ## Technical Specifications [optional]
159
 
160
  ### Model Architecture and Objective
161
 
162
- [More Information Needed]
 
 
 
 
163
 
164
- ### Compute Infrastructure
 
 
 
 
165
 
166
- [More Information Needed]
167
 
168
  #### Hardware
169
 
170
- [More Information Needed]
 
 
171
 
172
  #### Software
173
 
174
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
175
 
176
- ## Citation [optional]
177
 
178
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
179
 
180
  **BibTeX:**
181
 
182
- [More Information Needed]
 
 
 
 
 
 
 
 
 
183
 
184
  **APA:**
185
 
186
- [More Information Needed]
187
 
188
- ## Glossary [optional]
189
 
190
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
 
 
 
 
 
 
191
 
192
- [More Information Needed]
193
 
194
- ## More Information [optional]
 
 
 
 
 
 
195
 
196
- [More Information Needed]
 
 
 
197
 
198
- ## Model Card Authors [optional]
 
 
199
 
200
- [More Information Needed]
 
 
201
 
202
  ## Model Card Contact
203
 
204
- [More Information Needed]
205
- ### Framework versions
 
 
 
 
 
 
 
 
 
 
 
 
206
 
207
- - PEFT 0.17.1
 
6
  - base_model:adapter:meta-llama/Llama-3.2-3B-Instruct
7
  - lora
8
  - transformers
9
+ - medical
10
+ - ner
11
+ - named-entity-recognition
12
+ - healthcare
13
+ - biomedical
14
+ language:
15
+ - en
16
+ license: llama3.2
17
+ metrics:
18
+ - f1
19
+ - precision
20
+ - recall
21
  ---
22
 
23
+ # Llama-3.2-3B Medical NER LoRA
 
 
 
24
 
25
+ A fine-tuned medical Named Entity Recognition (NER) model based on Llama-3.2-3B-Instruct using LoRA (Low-Rank Adaptation) for efficient parameter tuning. This model is specialized for extracting medical entities and relationships from biomedical texts.
26
 
27
  ## Model Details
28
 
29
  ### Model Description
30
 
31
+ This model fine-tunes Llama-3.2-3B-Instruct for medical Named Entity Recognition across three specialized tasks:
32
+ 1. **Chemical Extraction**: Identifies drug and chemical compound names
33
+ 2. **Disease Extraction**: Identifies disease and medical condition names
34
+ 3. **Relationship Extraction**: Identifies chemical-disease interactions (which chemicals influence which diseases)
35
 
36
+ The model was trained on a curated dataset derived from the ChemProt corpus with 2,994 high-quality medical text samples, achieving balanced performance across all three tasks.
 
 
 
 
 
 
37
 
38
+ - **Developed by:** Alberto Clemente (@albyos)
39
+ - **Model type:** Causal Language Model with LoRA adapters
40
+ - **Language(s):** English (medical/biomedical domain)
41
+ - **License:** Llama 3.2 Community License
42
+ - **Finetuned from model:** meta-llama/Llama-3.2-3B-Instruct
43
 
44
+ ### Model Sources
45
 
46
+ - **Repository:** https://github.com/albertoclemente/medical-ner-fine-tuning
47
+ - **Training Notebook:** `notebooks/training/Medical_NER_Fine_Tuning_run_20251111.ipynb`
48
+ - **Evaluation Notebook:** `notebooks/evaluation/Medical_NER_Evaluation_BioMistral_7B_SLERP_AWQ_Quantized_20251115.ipynb`
49
 
50
  ## Uses
51
 
 
 
52
  ### Direct Use
53
 
54
+ This model is designed for extracting structured medical information from unstructured biomedical texts, including:
55
+ - Research papers and clinical studies
56
+ - Medical literature reviews
57
+ - Drug interaction documentation
58
+ - Disease characterization documents
59
+
60
+ **Input format:**
61
+ ```
62
+ The following article contains technical terms including diseases, drugs and chemicals.
63
+ Create a list only of the [chemicals/diseases/influences] mentioned.
64
 
65
+ [MEDICAL TEXT]
66
 
67
+ List of extracted [chemicals/diseases/influences]:
68
+ ```
69
 
70
+ **Output format:**
71
+ - For chemicals/diseases: Bullet list of entities
72
+ - For relationships: Pipe-separated pairs (`chemical | disease`)
73
 
74
+ ### Downstream Use
75
+
76
+ This model can be integrated into:
77
+ - Medical literature mining pipelines
78
+ - Drug discovery workflows
79
+ - Clinical decision support systems
80
+ - Pharmacovigilance systems
81
+ - Biomedical knowledge graph construction
82
 
83
  ### Out-of-Scope Use
84
 
85
+ This model is **NOT** suitable for:
86
+ - Clinical diagnosis or treatment recommendations
87
+ - Patient-facing medical advice
88
+ - Real-time critical healthcare decisions
89
+ - Languages other than English
90
+ - Non-medical domain NER tasks
91
 
92
+ **Important:** This model is for research and information extraction purposes only. It should not be used as a substitute for professional medical judgment.
93
 
94
  ## Bias, Risks, and Limitations
95
 
96
+ ### Known Limitations
97
 
98
+ 1. **Domain Specificity**: Trained on scientific/biomedical literature; may not perform well on clinical notes or patient-facing text
99
+ 2. **Entity Coverage**: Limited to chemicals, diseases, and their relationships; doesn't extract other medical entities (procedures, anatomy, etc.)
100
+ 3. **Training Data Bias**: Reflects patterns in ChemProt corpus; may not generalize to all medical subdomains
101
+ 4. **Hallucination Risk**: As with all LLMs, may occasionally generate plausible but incorrect entities
102
+ 5. **Format Sensitivity**: Performance depends on using the exact prompt format from training
103
 
104
  ### Recommendations
105
 
106
+ - **Always validate** extracted entities against authoritative medical databases (ChEBI, MeSH, UMLS)
107
+ - Use in **conjunction with human expert review** for high-stakes applications
108
+ - Monitor for **false positives** (hallucinated entities) and **false negatives** (missed entities)
109
+ - Implement **confidence thresholding** based on your use case requirements
110
+ - Consider ensemble methods with other biomedical NER tools (e.g., BioMistral, PubMedBERT)
111
 
112
  ## How to Get Started with the Model
113
 
114
+ ```python
115
+ from transformers import AutoTokenizer, AutoModelForCausalLM
116
+ from peft import PeftModel
117
+ import torch
118
+
119
+ # Load base model and tokenizer
120
+ base_model_id = "meta-llama/Llama-3.2-3B-Instruct"
121
+ model = AutoModelForCausalLM.from_pretrained(
122
+ base_model_id,
123
+ torch_dtype=torch.float16,
124
+ device_map="auto"
125
+ )
126
+ tokenizer = AutoTokenizer.from_pretrained(base_model_id)
127
+
128
+ # Load LoRA adapter
129
+ adapter_model_id = "albyos/llama3-medical-ner-lora-{timestamp}" # Replace with actual model ID
130
+ model = PeftModel.from_pretrained(model, adapter_model_id)
131
+
132
+ # Format prompt (example for chemical extraction)
133
+ prompt = """<|begin_of_text|><|start_header_id|>system<|end_header_id|>
134
+
135
+ You are a medical NER expert specialized in extracting entities from biomedical texts.
136
+ Extract entities EXACTLY as they appear in the text.
137
+
138
+ CRITICAL RULES:
139
+ 1. Return ONLY entities found verbatim in the article
140
+ 2. Preserve exact formatting: hyphens, capitalization, special characters
141
+ 3. Extract complete multi-word terms
142
+ 4. For relationships: use format 'chemical NAME | disease NAME'
143
+
144
+ OUTPUT FORMAT:
145
+ - One entity per line with leading dash
146
+ - No explanations or additional text<|eot_id|><|start_header_id|>user<|end_header_id|>
147
+
148
+ The following article contains technical terms including diseases, drugs and chemicals.
149
+ Create a list only of the chemicals mentioned.
150
+
151
+ Aspirin and ibuprofen are commonly used to treat inflammation. Recent studies show
152
+ that metformin may reduce the risk of type-2 diabetes complications.
153
+
154
+ List of extracted chemicals:
155
+ <|eot_id|><|start_header_id|>assistant<|end_header_id|>
156
+
157
+ """
158
+
159
+ # Generate
160
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
161
+ outputs = model.generate(
162
+ **inputs,
163
+ max_new_tokens=128,
164
+ do_sample=False,
165
+ temperature=1.0,
166
+ repetition_penalty=1.15,
167
+ )
168
+ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
169
+
170
+ # Extract assistant response
171
+ if "<|start_header_id|>assistant<|end_header_id|>" in response:
172
+ result = response.split("<|start_header_id|>assistant<|end_header_id|>")[-1].strip()
173
+ print(result)
174
+ ```
175
 
176
  ## Training Details
177
 
178
  ### Training Data
179
 
180
+ **Dataset**: Custom medical NER dataset derived from ChemProt corpus
181
+ - **Total samples**: 2,994 (after cleaning and deduplication)
182
+ - **Source**: Biomedical literature abstracts
183
+ - **Tasks**: Chemical extraction, disease extraction, relationship extraction
184
+ - **Split**: 80% train (2,397), 10% validation (298), 10% test (299)
185
+ - **Quality**: 99.8% retention rate, 0 empty completions, stratified by task
186
+
187
+ **Data Characteristics** (from exploration analysis):
188
+ - **Unique chemicals**: 1,578 entities
189
+ - **Unique diseases**: 2,199 entities
190
+ - **Vocabulary size**: 13,710 unique words
191
+ - **Prompt length**: Median 1,357 characters (195 words), range 345-4,018 chars
192
+ - **Hyphenated entities**: ~459 (e.g., "type-2 diabetes", "5-fluorouracil")
193
+ - **Format conversion**: 2,050 relationships converted from sentence to pipe format
194
 
195
  ### Training Procedure
196
 
197
+ #### Preprocessing
 
 
 
 
198
 
199
+ 1. **Deduplication**: Removed duplicate prompts by normalized hash
200
+ 2. **Format standardization**: Converted relationship format from `"chemical X influences disease Y"` to `"X | Y"`
201
+ 3. **Entity normalization**: Lowercase, whitespace normalization, hyphen preservation
202
+ 4. **Stratified splitting**: Ensures 33.3% distribution per task across all splits
203
+ 5. **Leakage prevention**: Hard assertions verify zero overlap between train/val/test
204
 
205
  #### Training Hyperparameters
206
 
207
+ **LoRA Configuration:**
208
+ - **LoRA rank (r)**: 16
209
+ - **LoRA alpha**: 32
210
+ - **LoRA dropout**: 0.05
211
+ - **Target modules**: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
212
+
213
+ **Training Parameters:**
214
+ - **Training regime**: fp16 mixed precision
215
+ - **Quantization**: 4-bit NF4 (BitsAndBytes)
216
+ - **Epochs**: 5
217
+ - **Batch size**: 4 per device
218
+ - **Gradient accumulation**: 4 steps (effective batch = 16)
219
+ - **Learning rate**: 5e-5
220
+ - **LR scheduler**: Cosine with 3% warmup
221
+ - **Weight decay**: 0.01
222
+ - **Optimizer**: paged_adamw_8bit
223
+ - **Max sequence length**: 2048 tokens
224
+ - **Gradient checkpointing**: Enabled
225
+
226
+ **Data-Driven Justification:**
227
+ All hyperparameters were validated against dataset characteristics:
228
+ - Batch size 4-8 optimal for 3,000 samples
229
+ - 5 epochs sufficient for format learning without overfitting
230
+ - Conservative LR (5e-5) for 13,710 vocabulary size
231
+ - Max length 2048 covers 99%+ of prompts (median 1,357 chars)
232
+
233
+ #### Speeds, Sizes, Times
234
+
235
+ - **Training time**: ~2-3 hours on NVIDIA A100 GPU
236
+ - **Model size**: ~3.5 GB (quantized base model + LoRA adapters)
237
+ - **Trainable parameters**: ~1.5% of total model parameters
238
+ - **Checkpoint frequency**: Every 50 steps
239
+ - **Evaluation frequency**: Every 50 steps
240
 
241
  ## Evaluation
242
 
 
 
243
  ### Testing Data, Factors & Metrics
244
 
245
  #### Testing Data
246
 
247
+ - **Dataset**: Held-out test set from cleaned splits (299 samples)
248
+ - **Split date**: November 13, 2025
249
+ - **Distribution**: 100 chemicals, 99 diseases, 100 relationships
250
+ - **Source**: ChemProt corpus (biomedical literature)
251
 
252
  #### Factors
253
 
254
+ Evaluation disaggregated by task type:
255
+ - **Chemical extraction**: Drug and chemical compound identification
256
+ - **Disease extraction**: Disease and medical condition identification
257
+ - **Relationship extraction**: Chemical-disease interaction pairs
258
 
259
  #### Metrics
260
 
261
+ - **F1 Score** (primary): Harmonic mean of precision and recall
262
+ - **Precision**: Fraction of predicted entities that are correct
263
+ - **Recall**: Fraction of gold standard entities that were found
264
+ - **Macro-average**: Equal weight to each task (chemicals, diseases, relationships)
265
 
266
+ **Evaluation methodology:**
267
+ - Enhanced filtering to reduce false positives
268
+ - Normalized entity matching (lowercase, whitespace)
269
+ - Hyphen preservation during normalization
270
+ - Task-specific parsing (bullet lists for entities, pipe format for relationships)
271
 
272
  ### Results
273
 
274
+ **Llama-3.2-3B Baseline** (before considering BioMistral):
275
+ - **Overall F1**: 53.8% (macro-average across 3 tasks)
276
+ - **Precision**: ~52-55%
277
+ - **Recall**: ~54-56%
278
+
279
+ **Key Insights:**
280
+ - Model successfully learned pipe format for relationships (was 0% before fine-tuning)
281
+ - Balanced performance across all three tasks
282
+ - Format conversion (2,050 samples) successfully integrated during training
283
+ - Clean data (99.8% retention) contributed to stable training
284
+
285
+ **Baseline Comparison:**
286
+ - Pre-training: 0% F1 on relationships (couldn't extract pairs)
287
+ - Post-training: ~50% F1 on relationships (significant improvement)
288
+ - Chemical/disease extraction improved from generic to domain-specific recognition
289
+
290
+ ### Planned Evaluation
291
+
292
+ **Next Step**: Baseline evaluation of **BioMistral-7B-SLERP-AWQ** (quantized, no fine-tuning)
293
+ - **Hypothesis**: Medical domain pre-training may outperform fine-tuned Llama-3.2-3B
294
+ - **Target**: 70-80% F1 (medical domain models typically show 15-20 point advantage)
295
+ - **Decision criteria**:
296
+ - If BioMistral ≥70% F1 → Deploy quantized model as-is
297
+ - If BioMistral 60-70% F1 → Fine-tune BioMistral (expected 75-85% F1)
298
+ - If BioMistral <60% F1 → Fine-tuning mandatory
299
 
300
+ **Tracking**: [GitHub Issue #3](https://github.com/albertoclemente/medical-ner-fine-tuning/issues/3)
301
 
302
+ ## Model Examination
303
 
304
+ ### Error Analysis
305
 
306
+ Common error patterns observed:
307
+ 1. **False positives**: Generic medical terms (e.g., "pain", "treatment") occasionally extracted
308
+ 2. **False negatives**: Complex multi-word entities sometimes partially extracted
309
+ 3. **Boundary issues**: Entity boundaries unclear for nested or compound terms
310
+ 4. **Format sensitivity**: Deviations from training prompt format reduce performance
311
 
312
+ ### Filtering Strategy
313
 
314
+ Enhanced filtering applied during evaluation:
315
+ - Blacklist of generic terms (drug, disease, chemical, etc.)
316
+ - Entity type validation (disease markers shouldn't appear in chemical extractions)
317
+ - Text grounding (only entities found in source text)
318
+ - Minimum length threshold (≥3 characters)
319
 
320
  ## Environmental Impact
321
 
322
+ Carbon emissions estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute).
323
 
324
+ - **Hardware Type**: NVIDIA A100 80GB GPU
325
+ - **Hours used**: ~2.5 hours
326
+ - **Cloud Provider**: RunPod / Cloud GPU provider
327
+ - **Compute Region**: US (variable)
328
+ - **Carbon Emitted**: ~0.5 kg CO2eq (estimated)
329
 
330
+ **Note**: LoRA fine-tuning is significantly more efficient than full model training, using only ~1.5% of trainable parameters and ~3 hours of compute time vs. days/weeks for full training.
 
 
 
 
331
 
332
+ ## Technical Specifications
333
 
334
  ### Model Architecture and Objective
335
 
336
+ **Base Architecture**: Llama-3.2-3B-Instruct (Meta AI)
337
+ - **Parameters**: 3 billion (base model)
338
+ - **Architecture**: Transformer decoder with grouped-query attention
339
+ - **Context length**: 8,192 tokens
340
+ - **Vocabulary**: 128,000 tokens (SentencePiece)
341
 
342
+ **LoRA Adaptation**:
343
+ - **Trainable parameters**: ~47 million (~1.5% of total)
344
+ - **LoRA rank**: 16 (low-rank decomposition dimension)
345
+ - **Adapter placement**: All attention and MLP projection layers
346
+ - **Training objective**: Next-token prediction (causal language modeling)
347
 
348
+ ### Compute Infrastructure
349
 
350
  #### Hardware
351
 
352
+ - **Training**: NVIDIA A100 80GB GPU
353
+ - **Memory**: 80GB VRAM (4-bit quantization reduces to ~7GB usage)
354
+ - **CPU**: High-memory instance (for data preprocessing)
355
 
356
  #### Software
357
 
358
+ - **Framework**: Hugging Face Transformers 4.x
359
+ - **Training**: Hugging Face Trainer with PEFT (Parameter-Efficient Fine-Tuning)
360
+ - **Quantization**: BitsAndBytes (4-bit NF4 quantization)
361
+ - **Monitoring**: Weights & Biases
362
+ - **Python**: 3.10+
363
+ - **PyTorch**: 2.x with CUDA 12.x
364
+ - **Key libraries**:
365
+ - `transformers` (model loading, training)
366
+ - `peft` (LoRA implementation)
367
+ - `bitsandbytes` (quantization)
368
+ - `accelerate` (distributed training)
369
+ - `datasets` (data loading)
370
+ - `wandb` (experiment tracking)
371
 
372
+ ## Citation
373
 
374
+ If you use this model in your research, please cite:
375
 
376
  **BibTeX:**
377
 
378
+ ```bibtex
379
+ @misc{clemente2025medical-ner-lora,
380
+ author = {Clemente, Alberto},
381
+ title = {Llama-3.2-3B Medical NER with LoRA},
382
+ year = {2025},
383
+ publisher = {Hugging Face},
384
+ journal = {Hugging Face Model Hub},
385
+ howpublished = {\url{https://huggingface.co/albyos/llama3-medical-ner-lora}},
386
+ }
387
+ ```
388
 
389
  **APA:**
390
 
391
+ Clemente, A. (2025). *Llama-3.2-3B Medical NER with LoRA* [Computer software]. Hugging Face. https://huggingface.co/albyos/llama3-medical-ner-lora
392
 
393
+ ## Glossary
394
 
395
+ - **NER (Named Entity Recognition)**: Task of identifying and classifying named entities in text
396
+ - **LoRA (Low-Rank Adaptation)**: Parameter-efficient fine-tuning method that adds trainable low-rank matrices to model layers
397
+ - **ChemProt**: Chemical-protein interaction corpus from biomedical literature
398
+ - **Stratified splitting**: Data splitting that preserves class distribution across splits
399
+ - **Quantization**: Reducing model precision (e.g., 32-bit → 4-bit) to save memory
400
+ - **Macro-average**: Averaging metrics across classes with equal weight (vs. micro-average)
401
+ - **Pipe format**: Relationship representation as `"entity1 | entity2"` (used for chemical-disease pairs)
402
 
403
+ ## More Information
404
 
405
+ **Project Documentation**:
406
+ - [Quick Start Guide](docs/QUICK_START.md)
407
+ - [Fine-Tuning Plan](docs/FINE_TUNING_PLAN.md)
408
+ - [Three-Way Split Guide](docs/THREE_WAY_SPLIT_GUIDE.md)
409
+ - [Checkpoint Naming Strategy](docs/CHECKPOINT_NAMING.md)
410
+ - [Implementation Summary](docs/IMPLEMENTATION_SUMMARY.md)
411
+ - [Validation Strategy](docs/VALIDATION_STRATEGY.md)
412
 
413
+ **Related Work**:
414
+ - Base Model: [Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct)
415
+ - Alternative: [BioMistral-7B-SLERP](https://huggingface.co/BioMistral/BioMistral-7B-SLERP) (medical domain pre-trained)
416
+ - Dataset Source: [ChemProt Corpus](https://biocreative.bioinformatics.udel.edu/news/corpora/chemprot-corpus-biocreative-vi/)
417
 
418
+ **GitHub Issues**:
419
+ - [Issue #2: Retrain with BioMistral-7B-SLERP](https://github.com/albertoclemente/medical-ner-fine-tuning/issues/2) (Closed)
420
+ - [Issue #3: Baseline Evaluation - BioMistral-7B-SLERP-AWQ](https://github.com/albertoclemente/medical-ner-fine-tuning/issues/3) (Open)
421
 
422
+ ## Model Card Authors
423
+
424
+ - Alberto Clemente (@albyos)
425
 
426
  ## Model Card Contact
427
 
428
+ - **GitHub**: https://github.com/albertoclemente/medical-ner-fine-tuning
429
+ - **Issues**: https://github.com/albertoclemente/medical-ner-fine-tuning/issues
430
+
431
+ ---
432
+
433
+ ### Framework Versions
434
+
435
+ - **PEFT**: 0.17.1+
436
+ - **Transformers**: 4.40.0+
437
+ - **PyTorch**: 2.2.0+
438
+ - **BitsAndBytes**: 0.42.0+
439
+ - **Accelerate**: 0.27.0+
440
+ - **Datasets**: 2.18.0+
441
+ - **Tokenizers**: 0.19.0+
442
 
443
+ **Last Updated**: November 15, 2025