File size: 7,656 Bytes
445fd2d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
479dd18
445fd2d
 
 
 
 
 
 
e88c25e
c51942b
 
e88c25e
c51942b
445fd2d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e88c25e
445fd2d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e88c25e
445fd2d
 
 
 
 
 
 
c51942b
445fd2d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c51942b
445fd2d
c51942b
445fd2d
c51942b
445fd2d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c51942b
445fd2d
 
 
 
 
 
 
 
 
 
 
c51942b
445fd2d
 
 
 
e88c25e
 
445fd2d
c51942b
445fd2d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
---
language:
- en
license: mit
library_name: transformers
tags:
- text-generation
- pytorch
- custom-architecture
- rope
- rmsnorm
- swiglu
- flash-attention
- 16k-context
pipeline_tag: text-generation
widget:
- text: "The future of artificial intelligence is"
  example_title: "AI Future"
- text: "Write a short story about"
  example_title: "Story Generation"
- text: "Explain quantum computing in simple terms:"
  example_title: "Technical Explanation"
datasets:
- tiiuae/falcon-refinedweb
metrics:
- perplexity
model-index:
- name: MAP-NEO Mini
  results:
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: RefinedWeb (100K subset)
      type: tiiuae/falcon-refinedweb
    metrics:
    - type: perplexity
      value: 3.9
      name: Final Training Loss
---

# MAP-NEO Mini

## Model Description

**MAP-NEO Mini** is a 253M parameter autoregressive language model built from scratch with modern architectural improvements. It demonstrates that high-quality language models can be trained efficiently on modest hardware while achieving competitive performance through careful data curation and architectural choices.

- **Developed by**: Antony Austin
- **Model type**: Autoregressive Language Model
- **Language(s)**: English
- **License**: MIT
- **Architecture**: Custom transformer with RoPE, RMSNorm, SwiGLU, and Flash Attention

## Key Features

-  **Efficient Training**: Trained on RTX 5070 Laptop GPU (8GB VRAM) in ~4 hours
-  **Extended Context**: 16,384 token context window (16x typical small models)
-  **Memory Efficient**: Only 1.3GB VRAM for 1,800 tokens inference
-  **Fast Inference**: ~150+ tokens/second on consumer GPU
-  **High Quality Data**: Trained on curated RefinedWeb subset

## Architecture Details

### Model Architecture
- **Parameters**: 253,085,696 (253M)
- **Layers**: 16 transformer blocks
- **Hidden Size**: 1,024
- **Attention Heads**: 16
- **Head Dimension**: 64
- **FFN Hidden Size**: 2,736 (2.67x hidden size)
- **Vocabulary Size**: 50,257 (GPT-2 tokenizer)
- **Max Sequence Length**: 16,384 tokens

### Architectural Innovations
- **RMSNorm**: Root Mean Square Layer Normalization for training stability
- **RoPE**: Rotary Positional Embeddings for better positional understanding
- **SwiGLU**: Swish-Gated Linear Units for improved FFN performance
- **Flash Attention**: Memory-efficient attention computation
- **Weight Tying**: Input/output embeddings shared for parameter efficiency

## Training Data

### Dataset
- **Source**: `tiiuae/falcon-refinedweb` (curated subset)
- **Size**: 100,000 high-quality web documents
- **Tokens**: ~41 million tokens
- **Sequence Length**: 1,024 tokens per sequence
- **Sequences**: 40,965 packed sequences

### Data Quality
- Length filtering: 200-10,000 characters
- Language detection: English only
- Quality scoring: High-quality web content
- Deduplication: Exact and near-duplicate removal

## Training Procedure

### Training Configuration
- **Hardware**: NVIDIA RTX 5070 Laptop GPU (8GB VRAM)
- **Precision**: bfloat16 mixed precision
- **Batch Size**: 1 per device
- **Gradient Accumulation**: 32 steps
- **Effective Batch Size**: 32
- **Learning Rate**: 3e-4
- **Scheduler**: Cosine with linear warmup
- **Warmup Steps**: 3,750
- **Total Steps**: 150,000
- **Training Time**: ~4 hours

### Optimization Details
- **Optimizer**: AdamW (β₁=0.9, β₂=0.95, weight_decay=0.01)
- **Gradient Clipping**: 1.0
- **Gradient Checkpointing**: Enabled for memory efficiency
- **Loss Function**: Cross-entropy loss

### Context Extension
- **Base Context**: 2,048 tokens
- **Extended Context**: 16,384 tokens
- **Method**: Linear interpolation of positional embeddings
- **Validation**: Successfully tested up to 3,600 tokens

## Performance

### Training Metrics
- **Final Loss**: 3.907
- **Training Speed**: ~10 iterations/second
- **Peak Memory**: ~8GB VRAM
- **Convergence**: Smooth loss curve, no overfitting

### Inference Performance
- **Speed**: ~150+ tokens/second (RTX 5070)
- **Memory Usage**: 1.3GB for 1,800 token context
- **Context Limit**: 3,600 tokens practical limit
- **Temperature**: Recommended 0.7-0.9 for creative tasks

## Usage

### Quick Start
```
import torch
from transformers import AutoTokenizer
from model_neo import NeoMini, NeoMiniConfig

# Load model
config = NeoMiniConfig()
model = NeoMini(config)
checkpoint = torch.load("extended_context_model.pt")
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("gpt2")

# Generate text
prompt = "The future of AI is"
input_ids = tokenizer.encode(prompt, return_tensors="pt")
with torch.no_grad():
    output = model.generate(input_ids, max_length=100, temperature=0.8)
print(tokenizer.decode(output))
```
### Interactive Chat
```
python interactive_chat.py
```

### Generation Parameters
- **Temperature**: 0.7-0.9 for creative tasks, 0.3-0.5 for factual
- **Top-k**: 40-50
- **Top-p**: 0.8-0.9
- **Repetition Penalty**: 1.1-1.3

## Limitations

### Current Limitations
- **Base Model Only**: Not instruction-tuned (requires fine-tuning for chat)
- **Context Window**: Practical limit of ~3,600 tokens despite 16K architecture
- **Hardware Requirements**: Requires CUDA-capable GPU for optimal performance
- **Knowledge Cutoff**: Limited to web data patterns, no specific knowledge cutoff

### Known Issues
- Occasionally generates repetitive patterns (fixable with fine-tuning)
- May not follow instructions well (base model behavior)
- Sometimes produces formatting artifacts from web data

## Ethical Considerations

### Bias and Fairness
- Trained on web data which may contain societal biases
- No explicit bias mitigation applied during training
- Users should be aware of potential biased outputs

### Use Cases
**Intended Uses:**
- Research and experimentation
- Text generation and completion
- Creative writing assistance
- Educational purposes

**Out-of-Scope Uses:**
- Medical or legal advice
- High-stakes decision making
- Content that could cause harm

## Environmental Impact

### Carbon Footprint
- **Training Hardware**: Single RTX 5070 Laptop GPU (100W)
- **Training Time**: 4 hours
- **Estimated CO₂**: ~0.3 kg CO₂ equivalent
- **Efficiency**: 253M parameters per 0.3 kg CO₂

## Model Card Authors

[Antony Austin] - Model development and training
[30/08/2025] - Model card creation

## Citation

```
@misc{mapneo_mini_2025,
  title={MAP-NEO Mini: An Efficient 253M Parameter Language Model},
  author={[Antony Austin]},
  year={2025},
  howpublished={\url{https://huggingface.co/Austin207/Map-NEO}},
  note={Trained on NVIDIA RTX 5070 Laptop GPU with RefinedWeb data}
}
```

## Technical Details

### Hardware Requirements
- **Minimum**: 4GB VRAM for inference
- **Recommended**: 8GB VRAM for extended context
- **Training**: 8GB+ VRAM with mixed precision
- **CPU**: Any modern CPU (inference possible but slow)

## Future Work

### Planned Improvements
- [ ] Conversational fine-tuning with UltraChat dataset
- [ ] Instruction following capabilities
- [ ] Multi-language support
- [ ] Quantized versions (4-bit, 8-bit)
- [ ] ONNX export for edge deployment

### Research Directions
- Context window optimization beyond 16K
- More efficient attention mechanisms
- Improved training data curation
- Specialized domain fine-tuning

## Acknowledgments

- **Falcon RefinedWeb**: High-quality training data
- **Hugging Face**: Transformers library and infrastructure
- **Community**: Open-source ML community for architectural insights

---

**Last Updated**: August 30, 2025
**Model Version**: 1.0.0
**Status**: Base model (pre-conversational fine-tuning)