k050506koch
/

GPT3-175B-Arch-Demonstrator

Text Generation

Model card Files Files and versions

k050506koch commited on Apr 23

Commit

d71bd4c

·

verified ·

1 Parent(s): 10b97e0

Update README.md

Files changed (1) hide show

README.md +23 -3

README.md CHANGED Viewed

@@ -1,3 +1,23 @@
----
-license: mit
----

+---
+license: mit
+pipeline_tag: text-generation
+library_name: transformers
+---
+# GPT-3 175B Architecture Demonstrator
+This repository contains an architectural demonstrator of GPT-3 (175B parameters), based directly on the architecture described in the original GPT-3 paper (["Language Models are Few-Shot Learners"](https://arxiv.org/abs/2005.14165)). The model implementation details are available in [my GitHub repository](https://github.com/krll-corp/GPT3).
+## Important Note
+This repository includes an **untrained architectural demonstrator**, meaning it matches the GPT-3 architecture but **contains randomly initialized weights**. As a result, the model does not have meaningful predictive abilities out-of-the-box. However, due to the behavior of token sampling at extremely low temperature settings (e.g., `temperature=0.0001`), the model may appear to generate semi-coherent text by selecting the most probable tokens, even though its outputs remain essentially random.
+## Training
+You can train this demonstrator from scratch or fine-tune it using the scripts and configuration files provided in my GitHub repository. These files are fully compatible with this architecture demonstrator and designed to simplify the training process.
+Refer to the repository for detailed instructions and best practices.
+---
+Contributions are welcome!