Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,23 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: mit
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
pipeline_tag: text-generation
|
| 4 |
+
library_name: transformers
|
| 5 |
+
---
|
| 6 |
+
|
| 7 |
+
# GPT-3 175B Architecture Demonstrator
|
| 8 |
+
|
| 9 |
+
This repository contains an architectural demonstrator of GPT-3 (175B parameters), based directly on the architecture described in the original GPT-3 paper (["Language Models are Few-Shot Learners"](https://arxiv.org/abs/2005.14165)). The model implementation details are available in [my GitHub repository](https://github.com/krll-corp/GPT3).
|
| 10 |
+
|
| 11 |
+
## Important Note
|
| 12 |
+
|
| 13 |
+
This repository includes an **untrained architectural demonstrator**, meaning it matches the GPT-3 architecture but **contains randomly initialized weights**. As a result, the model does not have meaningful predictive abilities out-of-the-box. However, due to the behavior of token sampling at extremely low temperature settings (e.g., `temperature=0.0001`), the model may appear to generate semi-coherent text by selecting the most probable tokens, even though its outputs remain essentially random.
|
| 14 |
+
|
| 15 |
+
## Training
|
| 16 |
+
|
| 17 |
+
You can train this demonstrator from scratch or fine-tune it using the scripts and configuration files provided in my GitHub repository. These files are fully compatible with this architecture demonstrator and designed to simplify the training process.
|
| 18 |
+
|
| 19 |
+
Refer to the repository for detailed instructions and best practices.
|
| 20 |
+
|
| 21 |
+
---
|
| 22 |
+
|
| 23 |
+
Contributions are welcome!
|