Update README.md
Browse files
README.md
CHANGED
|
@@ -15,8 +15,8 @@ library_name: transformers
|
|
| 15 |
- **Model Developer:** Aayan Mishra
|
| 16 |
- **Model Type:** Causal Language Model
|
| 17 |
- **Architecture:** Transformer with Rotary Position Embeddings (RoPE), SwiGLU activation, RMSNorm, and Attention QKV bias
|
| 18 |
-
- **Parameters:** 14.7 billion total (13.1 billion non-embedding)
|
| 19 |
-
- **Layers:**
|
| 20 |
- **Attention Heads:** 28 for query and 4 for key-value (Grouped Query Attention)
|
| 21 |
- **Vocabulary Size:** Approximately 151,646 tokens
|
| 22 |
- **Context Length:** Supports up to 131,072 tokens
|
|
|
|
| 15 |
- **Model Developer:** Aayan Mishra
|
| 16 |
- **Model Type:** Causal Language Model
|
| 17 |
- **Architecture:** Transformer with Rotary Position Embeddings (RoPE), SwiGLU activation, RMSNorm, and Attention QKV bias
|
| 18 |
+
- **Parameters:** 14.7 billion total (13.1 billion non-embedding)
|
| 19 |
+
- **Layers:** 28
|
| 20 |
- **Attention Heads:** 28 for query and 4 for key-value (Grouped Query Attention)
|
| 21 |
- **Vocabulary Size:** Approximately 151,646 tokens
|
| 22 |
- **Context Length:** Supports up to 131,072 tokens
|