Upload EmojiLM - Byte-level Looped Transformer
Browse files- README.md +140 -0
- consolidated.pth +3 -0
- params.json +156 -0
- train_state_00000.json +68 -0
- train_state_00001.json +68 -0
- train_state_00002.json +68 -0
- train_state_00003.json +68 -0
README.md
ADDED
|
@@ -0,0 +1,140 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language:
|
| 3 |
+
- en
|
| 4 |
+
license: apache-2.0
|
| 5 |
+
tags:
|
| 6 |
+
- text-generation
|
| 7 |
+
- emoji
|
| 8 |
+
- byte-level
|
| 9 |
+
- looped-transformer
|
| 10 |
+
- text2emoji
|
| 11 |
+
datasets:
|
| 12 |
+
- KomeijiForce/Text2Emoji
|
| 13 |
+
---
|
| 14 |
+
|
| 15 |
+
# EmojiLM: Byte-Level Looped Transformer for Text-to-Emoji Translation
|
| 16 |
+
|
| 17 |
+
This model is a byte-level language model trained with a **looped transformer** architecture for translating text descriptions to emojis.
|
| 18 |
+
|
| 19 |
+
## Model Description
|
| 20 |
+
|
| 21 |
+
- **Model Type:** Causal Language Model with Looped Transformer Architecture
|
| 22 |
+
- **Task:** Text-to-Emoji Translation
|
| 23 |
+
- **Training Data:** KomeijiForce/Text2Emoji dataset (500k+ text-emoji pairs)
|
| 24 |
+
- **Tokenizer:** Byte-level (vocab size: 258)
|
| 25 |
+
|
| 26 |
+
### Architecture Details
|
| 27 |
+
|
| 28 |
+
**Looped Transformer Architecture:**
|
| 29 |
+
- **Base Layers:** 24
|
| 30 |
+
- **Number of Loops:** 8 (layers are applied iteratively)
|
| 31 |
+
- **Shared Layers:** True (parameter efficient)
|
| 32 |
+
- **Loop Residual:** True (residual connections across loops)
|
| 33 |
+
|
| 34 |
+
**Model Dimensions:**
|
| 35 |
+
- **Hidden Dimension:** 1024
|
| 36 |
+
- **Number of Attention Heads:** 16
|
| 37 |
+
- **KV Heads:** 16
|
| 38 |
+
- **Max Sequence Length:** 512
|
| 39 |
+
- **RoPE Theta:** 10000.0
|
| 40 |
+
|
| 41 |
+
### Training Configuration
|
| 42 |
+
|
| 43 |
+
- **Training Steps:** 5100
|
| 44 |
+
- **Batch Size:** 12
|
| 45 |
+
- **Sequence Length:** 512
|
| 46 |
+
- **Learning Rate:** 0.0003
|
| 47 |
+
- **Warmup Steps:** 1000
|
| 48 |
+
- **Optimizer:** AdamW (β1=0.9, β2=0.95)
|
| 49 |
+
- **LR Scheduler:** Cosine with min ratio 0.1
|
| 50 |
+
- **Gradient Clipping:** 1.0
|
| 51 |
+
- **Weight Decay:** 0.1
|
| 52 |
+
- **Precision:** BF16
|
| 53 |
+
|
| 54 |
+
## What is a Looped Transformer?
|
| 55 |
+
|
| 56 |
+
A looped transformer applies the same transformer layers multiple times in an iterative refinement process.
|
| 57 |
+
This is particularly effective for translation tasks as it allows the model to:
|
| 58 |
+
- Refine predictions through multiple iterations
|
| 59 |
+
- Use parameters more efficiently (shared weights across loops)
|
| 60 |
+
- Model complex input-output mappings with fewer total parameters
|
| 61 |
+
|
| 62 |
+
In this model, 24 layers are applied 8 times with residual connections between loops.
|
| 63 |
+
|
| 64 |
+
## Intended Use
|
| 65 |
+
|
| 66 |
+
This model is designed to translate text descriptions into appropriate emojis.
|
| 67 |
+
|
| 68 |
+
**Example Usage:**
|
| 69 |
+
```
|
| 70 |
+
Input: "I love pizza"
|
| 71 |
+
Output: "🍕❤️"
|
| 72 |
+
```
|
| 73 |
+
|
| 74 |
+
## Training Data
|
| 75 |
+
|
| 76 |
+
The model was trained on the **KomeijiForce/Text2Emoji** dataset, which contains over 500,000 text-emoji pairs.
|
| 77 |
+
|
| 78 |
+
## Model Files
|
| 79 |
+
|
| 80 |
+
This repository contains:
|
| 81 |
+
- `consolidated.pth`: PyTorch model weights
|
| 82 |
+
- `params.json`: Complete model and training configuration
|
| 83 |
+
- `train_state_*.json`: Training state information from checkpoint
|
| 84 |
+
|
| 85 |
+
## Usage
|
| 86 |
+
|
| 87 |
+
To use this model, you'll need the original BFlowNet/loopedLM codebase to load the architecture:
|
| 88 |
+
|
| 89 |
+
```python
|
| 90 |
+
import torch
|
| 91 |
+
import json
|
| 92 |
+
|
| 93 |
+
# Load model parameters
|
| 94 |
+
with open('params.json', 'r') as f:
|
| 95 |
+
params = json.load(f)
|
| 96 |
+
|
| 97 |
+
# Load model weights
|
| 98 |
+
checkpoint = torch.load('consolidated.pth', map_location='cpu')
|
| 99 |
+
|
| 100 |
+
# Initialize model with your BFlowNet loopedLM architecture
|
| 101 |
+
# from apps.loopedLM import LoopedTransformer
|
| 102 |
+
# model = LoopedTransformer(**params['model'])
|
| 103 |
+
# model.load_state_dict(checkpoint)
|
| 104 |
+
```
|
| 105 |
+
|
| 106 |
+
### Generation Parameters
|
| 107 |
+
|
| 108 |
+
For best results, use:
|
| 109 |
+
- **Max Tokens:** 128 (outputs are typically short)
|
| 110 |
+
- **Temperature:** 0.7 (for diverse emoji selection)
|
| 111 |
+
- **Top-p:** 0.9
|
| 112 |
+
|
| 113 |
+
## Limitations
|
| 114 |
+
|
| 115 |
+
- The model uses a byte-level tokenizer, which works well for emojis but may be less efficient than subword tokenization for general text
|
| 116 |
+
- Performance is optimized for text-to-emoji translation and may not generalize well to other tasks
|
| 117 |
+
- The model requires the specific looped transformer architecture implementation to load and use
|
| 118 |
+
|
| 119 |
+
## Citation
|
| 120 |
+
|
| 121 |
+
If you use this model, please cite:
|
| 122 |
+
|
| 123 |
+
```bibtex
|
| 124 |
+
@misc{emojilm-looped-transformer,
|
| 125 |
+
title={EmojiLM: Byte-Level Looped Transformer for Text-to-Emoji Translation},
|
| 126 |
+
author={Your Name},
|
| 127 |
+
year={2025},
|
| 128 |
+
howpublished={\url{https://huggingface.co/YOUR-USERNAME/emojilm-looped-transformer}}
|
| 129 |
+
}
|
| 130 |
+
```
|
| 131 |
+
|
| 132 |
+
## Training Framework
|
| 133 |
+
|
| 134 |
+
This model was trained using the BFlowNet framework with looped transformer architecture.
|
| 135 |
+
|
| 136 |
+
Dataset: [KomeijiForce/Text2Emoji](https://huggingface.co/datasets/KomeijiForce/Text2Emoji)
|
| 137 |
+
|
| 138 |
+
## License
|
| 139 |
+
|
| 140 |
+
Apache 2.0
|
consolidated.pth
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:f0589ff488e532cbb87a48b18a9aa53a468a3c907ad839b62ad248a434263457
|
| 3 |
+
size 1853427530
|
params.json
ADDED
|
@@ -0,0 +1,156 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"name": "looped_lm_text2emoji",
|
| 3 |
+
"dump_dir": "/home/cd110/BFlowNet/apps/loopedLM/results/text2emoji",
|
| 4 |
+
"seed": 42,
|
| 5 |
+
"grad_acc_steps": 1,
|
| 6 |
+
"gc_collect_freq": 1000,
|
| 7 |
+
"probe_freq": null,
|
| 8 |
+
"steps": 5100,
|
| 9 |
+
"data": {
|
| 10 |
+
"root_dir": "/home/cd110/BFlowNet/apps/loopedLM/text2emoji_prepared",
|
| 11 |
+
"sources": {
|
| 12 |
+
"text2emoji": 1.0
|
| 13 |
+
},
|
| 14 |
+
"batch_size": 12,
|
| 15 |
+
"seq_len": 512,
|
| 16 |
+
"n_views": 2,
|
| 17 |
+
"seed": 42,
|
| 18 |
+
"add_bos": true,
|
| 19 |
+
"add_eos": true,
|
| 20 |
+
"load_async": true,
|
| 21 |
+
"prefetch_size": 128,
|
| 22 |
+
"tokenizer": {
|
| 23 |
+
"name": "bytes",
|
| 24 |
+
"path": null
|
| 25 |
+
}
|
| 26 |
+
},
|
| 27 |
+
"optim": {
|
| 28 |
+
"lr": 0.0003,
|
| 29 |
+
"weight_decay": 0.1,
|
| 30 |
+
"epsilon": 1e-08,
|
| 31 |
+
"beta1": 0.9,
|
| 32 |
+
"beta2": 0.95,
|
| 33 |
+
"clip": 1.0,
|
| 34 |
+
"scheduler": "cosine",
|
| 35 |
+
"warmup": 1000,
|
| 36 |
+
"lr_min_ratio": 0.1,
|
| 37 |
+
"cycle_length": 1.0,
|
| 38 |
+
"cosine_theta": 1.0,
|
| 39 |
+
"annealing_step": 1000,
|
| 40 |
+
"decay_fraction": 0.1,
|
| 41 |
+
"exp_factor": 0.5
|
| 42 |
+
},
|
| 43 |
+
"model": {
|
| 44 |
+
"dim": 1024,
|
| 45 |
+
"n_layers": 24,
|
| 46 |
+
"head_dim": null,
|
| 47 |
+
"n_heads": 16,
|
| 48 |
+
"n_kv_heads": 16,
|
| 49 |
+
"ffn_dim_multiplier": null,
|
| 50 |
+
"multiple_of": 256,
|
| 51 |
+
"norm_eps": 1e-05,
|
| 52 |
+
"rope_theta": 10000.0,
|
| 53 |
+
"init_base_std": null,
|
| 54 |
+
"init_std_factor": "disabled",
|
| 55 |
+
"max_seqlen": 512,
|
| 56 |
+
"seed": 42,
|
| 57 |
+
"vocab_size": 258,
|
| 58 |
+
"weight_tying": false,
|
| 59 |
+
"sliding_window": null,
|
| 60 |
+
"n_loops": 8,
|
| 61 |
+
"shared_layers": true,
|
| 62 |
+
"loop_residual": true
|
| 63 |
+
},
|
| 64 |
+
"distributed": {
|
| 65 |
+
"dp_shard": 1,
|
| 66 |
+
"dp_replicate": 4,
|
| 67 |
+
"tp_size": 1,
|
| 68 |
+
"selective_activation_checkpointing": false,
|
| 69 |
+
"compile": true,
|
| 70 |
+
"fsdp_type": "full_shard",
|
| 71 |
+
"model_dtype": "bf16",
|
| 72 |
+
"float8_recipe": null,
|
| 73 |
+
"float8_filter": "layers\\.[0-9]+\\.",
|
| 74 |
+
"matmul_allow_tf32": true,
|
| 75 |
+
"detect_anomaly": false,
|
| 76 |
+
"compile_cache_size_limit": 8,
|
| 77 |
+
"spawn_method": "forkserver"
|
| 78 |
+
},
|
| 79 |
+
"env": {
|
| 80 |
+
"MKL_SERVICE_FORCE_INTEL": "GNU",
|
| 81 |
+
"OMP_NUM_THREADS": "1",
|
| 82 |
+
"MKL_NUM_THREADS": "1",
|
| 83 |
+
"ENABLE_INTRA_NODE_COMM": "1",
|
| 84 |
+
"TORCH_NCCL_AVOID_RECORD_STREAMS": "1",
|
| 85 |
+
"NCCL_IB_TIMEOUT": "22",
|
| 86 |
+
"NCCL_DEBUG": "INFO",
|
| 87 |
+
"TORCH_NCCL_ASYNC_ERROR_HANDLING": "1"
|
| 88 |
+
},
|
| 89 |
+
"checkpoint": {
|
| 90 |
+
"dump": {
|
| 91 |
+
"every": 500,
|
| 92 |
+
"keep": -1
|
| 93 |
+
},
|
| 94 |
+
"eval": {
|
| 95 |
+
"every": 1500000,
|
| 96 |
+
"keep": 3
|
| 97 |
+
},
|
| 98 |
+
"path": "/home/cd110/BFlowNet/apps/loopedLM/results/text2emoji/checkpoints",
|
| 99 |
+
"init_ckpt_path": null,
|
| 100 |
+
"continue_training_from_init": false
|
| 101 |
+
},
|
| 102 |
+
"profiling": {
|
| 103 |
+
"run": false,
|
| 104 |
+
"trace_folder": "profiling",
|
| 105 |
+
"mem_warmup": 100,
|
| 106 |
+
"mem_steps": 2,
|
| 107 |
+
"profile_warmup": 102,
|
| 108 |
+
"profile_steps": 2
|
| 109 |
+
},
|
| 110 |
+
"logging": {
|
| 111 |
+
"freq": 50,
|
| 112 |
+
"acc_freq": null,
|
| 113 |
+
"wandb": {
|
| 114 |
+
"job_type": null,
|
| 115 |
+
"dir": null,
|
| 116 |
+
"project": "looped_lm_text2emoji",
|
| 117 |
+
"entity": null,
|
| 118 |
+
"tags": null,
|
| 119 |
+
"group": null,
|
| 120 |
+
"name": "looped_lm_text2emoji",
|
| 121 |
+
"notes": null,
|
| 122 |
+
"config_exclude_keys": null,
|
| 123 |
+
"config_include_keys": null,
|
| 124 |
+
"anonymous": null,
|
| 125 |
+
"mode": null,
|
| 126 |
+
"allow_val_change": null,
|
| 127 |
+
"resume": null,
|
| 128 |
+
"force": null,
|
| 129 |
+
"tensorboard": null,
|
| 130 |
+
"sync_tensorboard": null,
|
| 131 |
+
"monitor_gym": null,
|
| 132 |
+
"save_code": null,
|
| 133 |
+
"id": null,
|
| 134 |
+
"fork_from": null,
|
| 135 |
+
"resume_from": null
|
| 136 |
+
}
|
| 137 |
+
},
|
| 138 |
+
"async_eval_gpus": null,
|
| 139 |
+
"eval": {
|
| 140 |
+
"generator": {
|
| 141 |
+
"max_tokens": 128,
|
| 142 |
+
"dtype": "bf16",
|
| 143 |
+
"temperature": 0.7,
|
| 144 |
+
"top_p": 0.9
|
| 145 |
+
},
|
| 146 |
+
"harness": {
|
| 147 |
+
"tasks": [
|
| 148 |
+
"hellaswag",
|
| 149 |
+
"piqa"
|
| 150 |
+
]
|
| 151 |
+
},
|
| 152 |
+
"validation": {
|
| 153 |
+
"max_steps": 200
|
| 154 |
+
}
|
| 155 |
+
}
|
| 156 |
+
}
|
train_state_00000.json
ADDED
|
@@ -0,0 +1,68 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"step": 5000,
|
| 3 |
+
"acc_step": 0,
|
| 4 |
+
"data_loader_state": {
|
| 5 |
+
"it_state": {
|
| 6 |
+
"start_token": 21,
|
| 7 |
+
"it_state": {
|
| 8 |
+
"it_state": {
|
| 9 |
+
"root_dir": "/home/cd110/BFlowNet/apps/loopedLM/text2emoji_prepared",
|
| 10 |
+
"sources": {
|
| 11 |
+
"text2emoji": 1.0
|
| 12 |
+
},
|
| 13 |
+
"source_to_state": {
|
| 14 |
+
"text2emoji": {
|
| 15 |
+
"file_path": "/home/cd110/BFlowNet/apps/loopedLM/text2emoji_prepared/text2emoji/text2emoji.chunk.03.json",
|
| 16 |
+
"position": 15423808,
|
| 17 |
+
"block_size": 1,
|
| 18 |
+
"offset": 0,
|
| 19 |
+
"current_iter": 1
|
| 20 |
+
}
|
| 21 |
+
},
|
| 22 |
+
"rng_state": {
|
| 23 |
+
"bit_generator": "PCG64",
|
| 24 |
+
"state": {
|
| 25 |
+
"state": 205585754356582501350349259899939902810,
|
| 26 |
+
"inc": 11676600559890430755450356507027720041
|
| 27 |
+
},
|
| 28 |
+
"has_uint32": 0,
|
| 29 |
+
"uinteger": 0
|
| 30 |
+
}
|
| 31 |
+
},
|
| 32 |
+
"add_bos": true,
|
| 33 |
+
"add_eos": true,
|
| 34 |
+
"name": "bytes",
|
| 35 |
+
"path": null
|
| 36 |
+
},
|
| 37 |
+
"output_seq_len": 512,
|
| 38 |
+
"n_views": 2
|
| 39 |
+
},
|
| 40 |
+
"seq_idx": 8,
|
| 41 |
+
"rng_state": {
|
| 42 |
+
"bit_generator": "PCG64",
|
| 43 |
+
"state": {
|
| 44 |
+
"state": 324250618518055952288627465431916920177,
|
| 45 |
+
"inc": 77357518920597472829800677777012462921
|
| 46 |
+
},
|
| 47 |
+
"has_uint32": 1,
|
| 48 |
+
"uinteger": 85385168
|
| 49 |
+
},
|
| 50 |
+
"batch_size": 12,
|
| 51 |
+
"prefetch_size": 128
|
| 52 |
+
},
|
| 53 |
+
"scheduler": {
|
| 54 |
+
"base_lrs": [
|
| 55 |
+
0.0003
|
| 56 |
+
],
|
| 57 |
+
"last_epoch": 5000,
|
| 58 |
+
"verbose": false,
|
| 59 |
+
"_step_count": 5001,
|
| 60 |
+
"_get_lr_called_within_step": false,
|
| 61 |
+
"_last_lr": [
|
| 62 |
+
3.039611684019504e-05
|
| 63 |
+
],
|
| 64 |
+
"lr_lambdas": [
|
| 65 |
+
{}
|
| 66 |
+
]
|
| 67 |
+
}
|
| 68 |
+
}
|
train_state_00001.json
ADDED
|
@@ -0,0 +1,68 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"step": 5000,
|
| 3 |
+
"acc_step": 0,
|
| 4 |
+
"data_loader_state": {
|
| 5 |
+
"it_state": {
|
| 6 |
+
"start_token": 74,
|
| 7 |
+
"it_state": {
|
| 8 |
+
"it_state": {
|
| 9 |
+
"root_dir": "/home/cd110/BFlowNet/apps/loopedLM/text2emoji_prepared",
|
| 10 |
+
"sources": {
|
| 11 |
+
"text2emoji": 1.0
|
| 12 |
+
},
|
| 13 |
+
"source_to_state": {
|
| 14 |
+
"text2emoji": {
|
| 15 |
+
"file_path": "/home/cd110/BFlowNet/apps/loopedLM/text2emoji_prepared/text2emoji/text2emoji.chunk.02.json",
|
| 16 |
+
"position": 15354030,
|
| 17 |
+
"block_size": 1,
|
| 18 |
+
"offset": 0,
|
| 19 |
+
"current_iter": 1
|
| 20 |
+
}
|
| 21 |
+
},
|
| 22 |
+
"rng_state": {
|
| 23 |
+
"bit_generator": "PCG64",
|
| 24 |
+
"state": {
|
| 25 |
+
"state": 46880898983298274906203770755158202503,
|
| 26 |
+
"inc": 239634081480473411747239400828488620799
|
| 27 |
+
},
|
| 28 |
+
"has_uint32": 0,
|
| 29 |
+
"uinteger": 0
|
| 30 |
+
}
|
| 31 |
+
},
|
| 32 |
+
"add_bos": true,
|
| 33 |
+
"add_eos": true,
|
| 34 |
+
"name": "bytes",
|
| 35 |
+
"path": null
|
| 36 |
+
},
|
| 37 |
+
"output_seq_len": 512,
|
| 38 |
+
"n_views": 2
|
| 39 |
+
},
|
| 40 |
+
"seq_idx": 8,
|
| 41 |
+
"rng_state": {
|
| 42 |
+
"bit_generator": "PCG64",
|
| 43 |
+
"state": {
|
| 44 |
+
"state": 319969186434683622935655786137931948242,
|
| 45 |
+
"inc": 270234035871729269002159329014059236425
|
| 46 |
+
},
|
| 47 |
+
"has_uint32": 0,
|
| 48 |
+
"uinteger": 2574191790
|
| 49 |
+
},
|
| 50 |
+
"batch_size": 12,
|
| 51 |
+
"prefetch_size": 128
|
| 52 |
+
},
|
| 53 |
+
"scheduler": {
|
| 54 |
+
"base_lrs": [
|
| 55 |
+
0.0003
|
| 56 |
+
],
|
| 57 |
+
"last_epoch": 5000,
|
| 58 |
+
"verbose": false,
|
| 59 |
+
"_step_count": 5001,
|
| 60 |
+
"_get_lr_called_within_step": false,
|
| 61 |
+
"_last_lr": [
|
| 62 |
+
3.039611684019504e-05
|
| 63 |
+
],
|
| 64 |
+
"lr_lambdas": [
|
| 65 |
+
{}
|
| 66 |
+
]
|
| 67 |
+
}
|
| 68 |
+
}
|
train_state_00002.json
ADDED
|
@@ -0,0 +1,68 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"step": 5000,
|
| 3 |
+
"acc_step": 0,
|
| 4 |
+
"data_loader_state": {
|
| 5 |
+
"it_state": {
|
| 6 |
+
"start_token": 99,
|
| 7 |
+
"it_state": {
|
| 8 |
+
"it_state": {
|
| 9 |
+
"root_dir": "/home/cd110/BFlowNet/apps/loopedLM/text2emoji_prepared",
|
| 10 |
+
"sources": {
|
| 11 |
+
"text2emoji": 1.0
|
| 12 |
+
},
|
| 13 |
+
"source_to_state": {
|
| 14 |
+
"text2emoji": {
|
| 15 |
+
"file_path": "/home/cd110/BFlowNet/apps/loopedLM/text2emoji_prepared/text2emoji/text2emoji.chunk.00.json",
|
| 16 |
+
"position": 15348564,
|
| 17 |
+
"block_size": 1,
|
| 18 |
+
"offset": 0,
|
| 19 |
+
"current_iter": 1
|
| 20 |
+
}
|
| 21 |
+
},
|
| 22 |
+
"rng_state": {
|
| 23 |
+
"bit_generator": "PCG64",
|
| 24 |
+
"state": {
|
| 25 |
+
"state": 3069845261208554751547166343436132358,
|
| 26 |
+
"inc": 6027823433652931085739778990793808165
|
| 27 |
+
},
|
| 28 |
+
"has_uint32": 0,
|
| 29 |
+
"uinteger": 0
|
| 30 |
+
}
|
| 31 |
+
},
|
| 32 |
+
"add_bos": true,
|
| 33 |
+
"add_eos": true,
|
| 34 |
+
"name": "bytes",
|
| 35 |
+
"path": null
|
| 36 |
+
},
|
| 37 |
+
"output_seq_len": 512,
|
| 38 |
+
"n_views": 2
|
| 39 |
+
},
|
| 40 |
+
"seq_idx": 8,
|
| 41 |
+
"rng_state": {
|
| 42 |
+
"bit_generator": "PCG64",
|
| 43 |
+
"state": {
|
| 44 |
+
"state": 177769472111706612176032620089344751308,
|
| 45 |
+
"inc": 188564971970541749319992297790591572713
|
| 46 |
+
},
|
| 47 |
+
"has_uint32": 1,
|
| 48 |
+
"uinteger": 2736346968
|
| 49 |
+
},
|
| 50 |
+
"batch_size": 12,
|
| 51 |
+
"prefetch_size": 128
|
| 52 |
+
},
|
| 53 |
+
"scheduler": {
|
| 54 |
+
"base_lrs": [
|
| 55 |
+
0.0003
|
| 56 |
+
],
|
| 57 |
+
"last_epoch": 5000,
|
| 58 |
+
"verbose": false,
|
| 59 |
+
"_step_count": 5001,
|
| 60 |
+
"_get_lr_called_within_step": false,
|
| 61 |
+
"_last_lr": [
|
| 62 |
+
3.039611684019504e-05
|
| 63 |
+
],
|
| 64 |
+
"lr_lambdas": [
|
| 65 |
+
{}
|
| 66 |
+
]
|
| 67 |
+
}
|
| 68 |
+
}
|
train_state_00003.json
ADDED
|
@@ -0,0 +1,68 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"step": 5000,
|
| 3 |
+
"acc_step": 0,
|
| 4 |
+
"data_loader_state": {
|
| 5 |
+
"it_state": {
|
| 6 |
+
"start_token": 100,
|
| 7 |
+
"it_state": {
|
| 8 |
+
"it_state": {
|
| 9 |
+
"root_dir": "/home/cd110/BFlowNet/apps/loopedLM/text2emoji_prepared",
|
| 10 |
+
"sources": {
|
| 11 |
+
"text2emoji": 1.0
|
| 12 |
+
},
|
| 13 |
+
"source_to_state": {
|
| 14 |
+
"text2emoji": {
|
| 15 |
+
"file_path": "/home/cd110/BFlowNet/apps/loopedLM/text2emoji_prepared/text2emoji/text2emoji.chunk.01.json",
|
| 16 |
+
"position": 15387317,
|
| 17 |
+
"block_size": 1,
|
| 18 |
+
"offset": 0,
|
| 19 |
+
"current_iter": 1
|
| 20 |
+
}
|
| 21 |
+
},
|
| 22 |
+
"rng_state": {
|
| 23 |
+
"bit_generator": "PCG64",
|
| 24 |
+
"state": {
|
| 25 |
+
"state": 81131458415525599535785437639948652948,
|
| 26 |
+
"inc": 92941856108932518968286621281627530405
|
| 27 |
+
},
|
| 28 |
+
"has_uint32": 0,
|
| 29 |
+
"uinteger": 0
|
| 30 |
+
}
|
| 31 |
+
},
|
| 32 |
+
"add_bos": true,
|
| 33 |
+
"add_eos": true,
|
| 34 |
+
"name": "bytes",
|
| 35 |
+
"path": null
|
| 36 |
+
},
|
| 37 |
+
"output_seq_len": 512,
|
| 38 |
+
"n_views": 2
|
| 39 |
+
},
|
| 40 |
+
"seq_idx": 8,
|
| 41 |
+
"rng_state": {
|
| 42 |
+
"bit_generator": "PCG64",
|
| 43 |
+
"state": {
|
| 44 |
+
"state": 286960010946238495423822789291240034500,
|
| 45 |
+
"inc": 66050176413739185524746886687120723265
|
| 46 |
+
},
|
| 47 |
+
"has_uint32": 1,
|
| 48 |
+
"uinteger": 1701660961
|
| 49 |
+
},
|
| 50 |
+
"batch_size": 12,
|
| 51 |
+
"prefetch_size": 128
|
| 52 |
+
},
|
| 53 |
+
"scheduler": {
|
| 54 |
+
"base_lrs": [
|
| 55 |
+
0.0003
|
| 56 |
+
],
|
| 57 |
+
"last_epoch": 5000,
|
| 58 |
+
"verbose": false,
|
| 59 |
+
"_step_count": 5001,
|
| 60 |
+
"_get_lr_called_within_step": false,
|
| 61 |
+
"_last_lr": [
|
| 62 |
+
3.039611684019504e-05
|
| 63 |
+
],
|
| 64 |
+
"lr_lambdas": [
|
| 65 |
+
{}
|
| 66 |
+
]
|
| 67 |
+
}
|
| 68 |
+
}
|