k050506koch
/

GPT3-dev-125m-0612

Text Generation

Model card Files Files and versions

k050506koch commited on Dec 6, 2024

Commit

758903e

·

verified ·

1 Parent(s): 812346b

Created README

Files changed (1) hide show

README.md +37 -3

README.md CHANGED Viewed

@@ -1,3 +1,37 @@
----
-license: mit
----

+---
+license: mit
+datasets:
+- HuggingFaceFW/fineweb
+language:
+- en
+pipeline_tag: text-generation
+widget:
+- text: "He is a doctor. His main goal is"
+  example_title: " to help people."
+- text: "My name is Merve and my favorite"
+  example_title: "activity is reading."
+---
+# GPT3
+Welcome to the GPT3 repository! This project is an attempt to recreate the architecture and approach from the original OpenAI GPT-3 paper. The repository includes scripts for training, fine-tuning, and inference of a GPT-3-like model using PyTorch and the Hugging Face Transformers library.
+Here are located weights of dev checkpoints of my models. You can always download a folder, paste it's path inside inference.py and chat with them.
+# **You can find all code on [GitHub](https://github.com/krll-corp/GPT3)**
+# Note: This is a model with 125 million parameters. It was trained on 3.6Bn tokens. (Of course, it's very undertrained, but this one should be a technology demonstrator.)
+# Note 2: This is a model checkpoint released on 06/12 2024 and has been trained for longer (12 batch size, 4 grad accumulation, 512 tokens and 600,000 steps). It scores 27.65% on MMLU which is slightly higher than 25% (random guess)
+## Contributing
+Contributions are welcome! I'm just a student who is interested in AI so my code may be incorrect or have logical issues. Please open an issue or submit a pull request for any improvements or bug fixes, I will be happy.
+## License
+This project is licensed under the MIT License. See the LICENSE file for details. Everyone can use and modify this code at their discretion.
+## Acknowledgements
+Thanks OpenAI, HuggingFace and Pytorch for making this project possible!
+- [OpenAI GPT-3 Paper](https://arxiv.org/abs/2005.14165)
+- [Hugging Face Transformers](https://github.com/huggingface/transformers)
+- [PyTorch](https://pytorch.org/)