k050506koch commited on
Commit
d71bd4c
·
verified ·
1 Parent(s): 10b97e0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +23 -3
README.md CHANGED
@@ -1,3 +1,23 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ pipeline_tag: text-generation
4
+ library_name: transformers
5
+ ---
6
+
7
+ # GPT-3 175B Architecture Demonstrator
8
+
9
+ This repository contains an architectural demonstrator of GPT-3 (175B parameters), based directly on the architecture described in the original GPT-3 paper (["Language Models are Few-Shot Learners"](https://arxiv.org/abs/2005.14165)). The model implementation details are available in [my GitHub repository](https://github.com/krll-corp/GPT3).
10
+
11
+ ## Important Note
12
+
13
+ This repository includes an **untrained architectural demonstrator**, meaning it matches the GPT-3 architecture but **contains randomly initialized weights**. As a result, the model does not have meaningful predictive abilities out-of-the-box. However, due to the behavior of token sampling at extremely low temperature settings (e.g., `temperature=0.0001`), the model may appear to generate semi-coherent text by selecting the most probable tokens, even though its outputs remain essentially random.
14
+
15
+ ## Training
16
+
17
+ You can train this demonstrator from scratch or fine-tune it using the scripts and configuration files provided in my GitHub repository. These files are fully compatible with this architecture demonstrator and designed to simplify the training process.
18
+
19
+ Refer to the repository for detailed instructions and best practices.
20
+
21
+ ---
22
+
23
+ Contributions are welcome!