--- license: mit language: - en tags: - text-generation-inference pipeline_tag: text-generation --- ![GPTUsenet2](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F64b7618e2f5a966b972e9978%2FFNEKaeJ3of0W_HQ8x3amo.jpeg) ## GPT-Usenet An 81-million parameter LLM using GPT-2 encodings. Trained using 10GB of USENET posts along with over 1 GB of miscellaneous BBS posts, digitized books, and text documents. Supervised fine-tuning should be performed before use. ## Purpose of GPT-Usenet LLMs are all currently focused on becoming larger and larger, able to do more and more. However, this just makes them jack of all trades, master of none. GPT-Usenet takes a different approach. Instead of trying to do everything perfectly, GPT-Usenet offers a digital stem cell, which can then be finetuned into a single, specialized role and run in parallel with copies of itself. ## Technical Information | | | |---------------------------------|----:| |Layers |10| |Heads |10| |Embeddings |640| |Context Window |1024 tokens| |Tokenizer |GPT-2 BPE| ## Training Information | | | |---------------------------------|----:| |Training Loss |2.3256| |Validation Loss |2.3651| |Device |Google Colab L4| |Training Time |16 Hours| ## Example Syntax | | | |---------------------------------|----:| |uucp:|The path of reasoning you want GPT-Usenet to use when thinking. Use lowercase words separated by exclamation points.| |Internet:|The system calls relevant to this email| |Path:|The path of reasoning you want GPT-Usenet to use when writing. Use lowercase words separated by exclamation points.| |From:|The username who sent this message| |Sender:|The group that username belongs to| |Newsgroups:|The broad subject field of the email.| |Subject:|The prompt| |Message-ID:|The type of message this is.| |Date:|Use this field to simulate urgency or moods.| |Organization:|The system GPT-Usenet is running on.(testing... deployment... simulation)| |Lines:|How long the message is.| |Write the SFT response here. First, Prefix the first sentence with > to signify that it is a Reasoning sentence.|| |--|The stop tokens| ``` uucp:!field1!field2! Internet:simulation Path:!field1!field2! From:user Sender:usergroup Newsgroups:motorskills.papercraft Subject:Build a paper airplane Message-ID:Command Date:01 Jan 01 00:00:01 GMT Organization:deployment Lines: 1 >Provide detailed steps on building a paper airplane. -- ``` For finetuning, your data should be in the .mbox format.