--- language: - ru license: apache-2.0 tags: - model_hub_mixin - pytorch_model_hub_mixin - russian - text-generation - jokes - anecdotes - transformer - causal-lm datasets: - IgorVolochay/russian_jokes --- # Russian Jokes Generator This repository contains three versions of a Transformer-based language model fine-tuned on a dataset of Russian jokes (Anecdotes). The models are designed to generate humorous and coherent Russian text. There are three branch available: "main" (with `nano` model), "mini", "small". Also in this repository remain the pretrained on the dataset Byte-level BPE Tokenizer. The most coherent and powerful model is `small`. ## Model Details ### Model Architecture These models is a Transformer with ALiBi positional embeddings (or RoPE, Rotary Positional Embedding), Grouped-Query Attention (GQA), and SwiGLU activation. Two of three models were trained with Multi-Head Latent Attention. There are three versions: - **Nano**: 3 layers, 4 heads, 96 hidden dimensions. - **Mini**: 6 layers, 6 heads, 384 hidden dimensions. Trained with RoPE and MHLA. - **Small**: 12 layers, 12 heads, 768 hidden dimensions.Trained with RoPE and MHLA. - **Tokenizer**: Byte-level BPE tokenizer trained on the Russian jokes dataset. ### Training Details 1) Training Epochs are calculated from the number of full iterations of all dataset and were set from the n_step parameter in the initialization of Trainer. Finally, there are 1 for nano model, 1 for mini model, 6 for small model. 2) Batch Size: 32 - for nano and mini. 64 - for small. 3) Learning Rate: 5e-4 with cosine decay for small, 3e-4 for nano and small. 4) As a Loss were used Cross-entropy loss. 5) For Hardware were used NVIDIA A100 GPU in the Google Colab. ### Performance | Model | Training Loss (min) | Validation Loss (min) | |-------|---------------------:|------------------------:| | Nano | 3.784 | 3.932 | | Mini | 3.127 | 3.144 | | Small | 2.933 | 3.025 | #### Nano Plots ![image/png](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F67c40beb3a3d19149b5bdfbf%2FLSx0VS3BnNYt3Lokh7jsQ.png) Epoch: | Parameter | Min | Max | Cur | |----------|------:|------:|------:| | epoch | 0.000 | 1.000 | 1.000 | Loss: | Parameter | Min | Max | Cur | |------------|------:|------:|------:| | training | 3.784 | 6.952 | 3.900 | | validation | 3.932 | 4.902 | 3.932 | --- #### Mini Plots ![image/png](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F67c40beb3a3d19149b5bdfbf%2Fg20QvPA9RpQR_rGc3RYXK.png) Epoch: | Parameter | Min | Max | Cur | |-----------|------:|------:|------:| | epoch | 0.000 | 1.000 | 1.000 | Loss: | Parameter | Min | Max | Cur | |-------------|------:|------:|------:| | training | 3.127 | 7.000 | 3.278 | | validation | 3.144 | 4.500 | 3.144 | #### Small Plots ![image/png](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F67c40beb3a3d19149b5bdfbf%2FtylbErxZYhKUOgv-YfNjH.png) Epoch: | Parameter | Min | Max | Cur | |-----------|------:|------:|------:| | epoch | 0.000 | 6.000 | 6.000 | Loss: | Parameter | Min | Max | Cur | |-------------|------:|------:|------:| | training | 2.236 | 7.078 | 2.328 | | validation | 2.568 | 4.657 | 2.568 | ## Usage You can load the models and tokenizer: In the `main` branch - `nano`, in the `mini` - `mini`, in the `small` - `small`: ```python # Small model model_small = TransformerForCausalLM.from_pretrained("estnafinema0/russian-jokes-generator ", revision="small") tokenizer = ByteLevelBPETokenizer.from_pretrained("estnafinema0/russian-jokes-generator ") ``` To generate the examples with the initial prompt: ```python text = "Штирлиц пришел домой" input_ids = torch.tensor(tokenizer.encode(text), device=device) model_output = check_model.generate( input_ids[None, :], max_new_tokens=200, eos_token_id=tokenizer.eos_token_id, do_sample=True, top_k=10 ) tokenizer.decode(model_output[0].tolist()) ``` ### Example Generations Here are some examples of generated jokes by the `small` model: 1. **Input**: "Пришел Петя в баню и говорит" **Output**: "Пришел Петя в баню и говорит - Василий Иванович, вы знаете, кто я - Петя, или Петя? - Ахааха, и я - Ахаилая, я - Ахаил! - А какая Петя? - Я - Ахаилая! - Ну и я, когда я банкрот, банкротство, конечно..." 2. **Input**: "Вышел как-то на крыльцо" **Output**: "Вышел как-то на крыльцо, а там плачет. Стукнулся: упал, выпал. Плачет – упал." 3. **Input**: "Священник задает ребёнку вопрос" **Output**: "Священник задает ребёнку вопрос ему на ухо:- Что, братан, опять несёл?- Братан, ты что, братан, охуел?" ## License This model is licensed under the Apache 2.0 License. This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration: - Library: [More Information Needed]