---
language:
- ru
license: apache-2.0
tags:
- model_hub_mixin
- pytorch_model_hub_mixin
- russian
- text-generation
- jokes
- anecdotes
- transformer
- causal-lm
datasets:
- IgorVolochay/russian_jokes
---

# Russian Jokes Generator

This repository contains three versions of a Transformer-based language model fine-tuned on a dataset of Russian jokes (Anecdotes). The models are designed to generate humorous and coherent Russian text. There are three branch available: "main" (with `nano` model), "mini", "small". 
Also in this repository remain the pretrained on the dataset Byte-level BPE Tokenizer. The most coherent and powerful model is `small`.

## Model Details

### Model Architecture

These models is a Transformer with ALiBi positional embeddings (or RoPE, Rotary Positional Embedding), Grouped-Query Attention (GQA), and SwiGLU activation. Two of three models were trained with Multi-Head Latent Attention.

There are three versions:
  - **Nano**: 3 layers, 4 heads, 96 hidden dimensions.
  - **Mini**: 6 layers, 6 heads, 384 hidden dimensions. Trained with RoPE and MHLA.
  - **Small**: 12 layers, 12 heads, 768 hidden dimensions.Trained with RoPE and MHLA.
- **Tokenizer**: Byte-level BPE tokenizer trained on the Russian jokes dataset.

### Training Details

1) Training Epochs are calculated from the number of full iterations of all dataset and were set from the n_step parameter in the initialization of Trainer.
Finally, there are 1 for nano model, 1 for mini model, 6 for small model.

2) Batch Size: 32 - for nano and mini. 64 - for small.

3) Learning Rate: 5e-4 with cosine decay for small, 3e-4 for nano and small.

4) As a Loss were used Cross-entropy loss.

5) For Hardware were used NVIDIA A100 GPU in the Google Colab.

### Performance 

| Model | Training Loss (min) | Validation Loss (min) |
|-------|---------------------:|------------------------:|
| Nano  | 3.784              | 3.932                  |
| Mini  | 3.127              | 3.144                  |
| Small | 2.933              | 3.025                  |

#### Nano Plots

![image/png](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F67c40beb3a3d19149b5bdfbf%2FLSx0VS3BnNYt3Lokh7jsQ.png)

Epoch:
| Parameter | Min   | Max   | Cur   |
|----------|------:|------:|------:|
| epoch    | 0.000 | 1.000 | 1.000 |

Loss:
| Parameter   | Min   | Max   | Cur   |
|------------|------:|------:|------:|
| training   | 3.784 | 6.952 | 3.900 |
| validation | 3.932 | 4.902 | 3.932 |

---

#### Mini Plots

![image/png](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F67c40beb3a3d19149b5bdfbf%2Fg20QvPA9RpQR_rGc3RYXK.png)

Epoch:

| Parameter | Min   | Max   | Cur   |
|-----------|------:|------:|------:|
| epoch     | 0.000 | 1.000 | 1.000 |

Loss:

| Parameter   | Min   | Max   | Cur   |
|-------------|------:|------:|------:|
| training    | 3.127 | 7.000 | 3.278 |
| validation  | 3.144 | 4.500 | 3.144 |

#### Small Plots


![image/png](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F67c40beb3a3d19149b5bdfbf%2FtylbErxZYhKUOgv-YfNjH.png)

Epoch:

| Parameter | Min   | Max   | Cur   |
|-----------|------:|------:|------:|
| epoch     | 0.000 | 6.000 | 6.000 |

Loss:

| Parameter   | Min   | Max   | Cur   |
|-------------|------:|------:|------:|
| training    | 2.236 | 7.078 | 2.328 |
| validation  | 2.568 | 4.657 | 2.568 |

## Usage

You can load the models and tokenizer:

In the `main` branch - `nano`,
in the `mini` - `mini`, 
in the `small` - `small`:

```python
# Small model
model_small = TransformerForCausalLM.from_pretrained("estnafinema0/russian-jokes-generator ", revision="small")
tokenizer = ByteLevelBPETokenizer.from_pretrained("estnafinema0/russian-jokes-generator ")
```

To generate the examples with the initial prompt:
```python
text = "Штирлиц пришел домой"
input_ids = torch.tensor(tokenizer.encode(text), device=device)
model_output = check_model.generate(
    input_ids[None, :], max_new_tokens=200, eos_token_id=tokenizer.eos_token_id, do_sample=True, top_k=10
)
tokenizer.decode(model_output[0].tolist())
```

### Example Generations

Here are some examples of generated jokes by the `small` model:

1. **Input**: "Пришел Петя в баню и говорит"

   **Output**: "Пришел Петя в баню и говорит - Василий Иванович, вы знаете, кто я - Петя, или Петя? - Ахааха, и я - Ахаилая, я - Ахаил! - А какая Петя? - Я - Ахаилая! - Ну и я, когда я банкрот, банкротство, конечно..."

2. **Input**: "Вышел как-то на крыльцо"

   **Output**: "Вышел как-то на крыльцо, а там плачет. Стукнулся: упал, выпал. Плачет – упал."

3. **Input**: "Священник задает ребёнку вопрос"
   **Output**: "Священник задает ребёнку вопрос ему на ухо:- Что, братан, опять несёл?- Братан, ты что, братан, охуел?"


## License

This model is licensed under the Apache 2.0 License.


This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
- Library: [More Information Needed]