CharlesDDDD commited on
Commit
65ead92
·
verified ·
1 Parent(s): e6ff052

Upload EmojiLM - Byte-level Looped Transformer

Browse files
README.md ADDED
@@ -0,0 +1,140 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ tags:
6
+ - text-generation
7
+ - emoji
8
+ - byte-level
9
+ - looped-transformer
10
+ - text2emoji
11
+ datasets:
12
+ - KomeijiForce/Text2Emoji
13
+ ---
14
+
15
+ # EmojiLM: Byte-Level Looped Transformer for Text-to-Emoji Translation
16
+
17
+ This model is a byte-level language model trained with a **looped transformer** architecture for translating text descriptions to emojis.
18
+
19
+ ## Model Description
20
+
21
+ - **Model Type:** Causal Language Model with Looped Transformer Architecture
22
+ - **Task:** Text-to-Emoji Translation
23
+ - **Training Data:** KomeijiForce/Text2Emoji dataset (500k+ text-emoji pairs)
24
+ - **Tokenizer:** Byte-level (vocab size: 258)
25
+
26
+ ### Architecture Details
27
+
28
+ **Looped Transformer Architecture:**
29
+ - **Base Layers:** 24
30
+ - **Number of Loops:** 8 (layers are applied iteratively)
31
+ - **Shared Layers:** True (parameter efficient)
32
+ - **Loop Residual:** True (residual connections across loops)
33
+
34
+ **Model Dimensions:**
35
+ - **Hidden Dimension:** 1024
36
+ - **Number of Attention Heads:** 16
37
+ - **KV Heads:** 16
38
+ - **Max Sequence Length:** 512
39
+ - **RoPE Theta:** 10000.0
40
+
41
+ ### Training Configuration
42
+
43
+ - **Training Steps:** 5100
44
+ - **Batch Size:** 12
45
+ - **Sequence Length:** 512
46
+ - **Learning Rate:** 0.0003
47
+ - **Warmup Steps:** 1000
48
+ - **Optimizer:** AdamW (β1=0.9, β2=0.95)
49
+ - **LR Scheduler:** Cosine with min ratio 0.1
50
+ - **Gradient Clipping:** 1.0
51
+ - **Weight Decay:** 0.1
52
+ - **Precision:** BF16
53
+
54
+ ## What is a Looped Transformer?
55
+
56
+ A looped transformer applies the same transformer layers multiple times in an iterative refinement process.
57
+ This is particularly effective for translation tasks as it allows the model to:
58
+ - Refine predictions through multiple iterations
59
+ - Use parameters more efficiently (shared weights across loops)
60
+ - Model complex input-output mappings with fewer total parameters
61
+
62
+ In this model, 24 layers are applied 8 times with residual connections between loops.
63
+
64
+ ## Intended Use
65
+
66
+ This model is designed to translate text descriptions into appropriate emojis.
67
+
68
+ **Example Usage:**
69
+ ```
70
+ Input: "I love pizza"
71
+ Output: "🍕❤️"
72
+ ```
73
+
74
+ ## Training Data
75
+
76
+ The model was trained on the **KomeijiForce/Text2Emoji** dataset, which contains over 500,000 text-emoji pairs.
77
+
78
+ ## Model Files
79
+
80
+ This repository contains:
81
+ - `consolidated.pth`: PyTorch model weights
82
+ - `params.json`: Complete model and training configuration
83
+ - `train_state_*.json`: Training state information from checkpoint
84
+
85
+ ## Usage
86
+
87
+ To use this model, you'll need the original BFlowNet/loopedLM codebase to load the architecture:
88
+
89
+ ```python
90
+ import torch
91
+ import json
92
+
93
+ # Load model parameters
94
+ with open('params.json', 'r') as f:
95
+ params = json.load(f)
96
+
97
+ # Load model weights
98
+ checkpoint = torch.load('consolidated.pth', map_location='cpu')
99
+
100
+ # Initialize model with your BFlowNet loopedLM architecture
101
+ # from apps.loopedLM import LoopedTransformer
102
+ # model = LoopedTransformer(**params['model'])
103
+ # model.load_state_dict(checkpoint)
104
+ ```
105
+
106
+ ### Generation Parameters
107
+
108
+ For best results, use:
109
+ - **Max Tokens:** 128 (outputs are typically short)
110
+ - **Temperature:** 0.7 (for diverse emoji selection)
111
+ - **Top-p:** 0.9
112
+
113
+ ## Limitations
114
+
115
+ - The model uses a byte-level tokenizer, which works well for emojis but may be less efficient than subword tokenization for general text
116
+ - Performance is optimized for text-to-emoji translation and may not generalize well to other tasks
117
+ - The model requires the specific looped transformer architecture implementation to load and use
118
+
119
+ ## Citation
120
+
121
+ If you use this model, please cite:
122
+
123
+ ```bibtex
124
+ @misc{emojilm-looped-transformer,
125
+ title={EmojiLM: Byte-Level Looped Transformer for Text-to-Emoji Translation},
126
+ author={Your Name},
127
+ year={2025},
128
+ howpublished={\url{https://huggingface.co/YOUR-USERNAME/emojilm-looped-transformer}}
129
+ }
130
+ ```
131
+
132
+ ## Training Framework
133
+
134
+ This model was trained using the BFlowNet framework with looped transformer architecture.
135
+
136
+ Dataset: [KomeijiForce/Text2Emoji](https://huggingface.co/datasets/KomeijiForce/Text2Emoji)
137
+
138
+ ## License
139
+
140
+ Apache 2.0
consolidated.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f0589ff488e532cbb87a48b18a9aa53a468a3c907ad839b62ad248a434263457
3
+ size 1853427530
params.json ADDED
@@ -0,0 +1,156 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "name": "looped_lm_text2emoji",
3
+ "dump_dir": "/home/cd110/BFlowNet/apps/loopedLM/results/text2emoji",
4
+ "seed": 42,
5
+ "grad_acc_steps": 1,
6
+ "gc_collect_freq": 1000,
7
+ "probe_freq": null,
8
+ "steps": 5100,
9
+ "data": {
10
+ "root_dir": "/home/cd110/BFlowNet/apps/loopedLM/text2emoji_prepared",
11
+ "sources": {
12
+ "text2emoji": 1.0
13
+ },
14
+ "batch_size": 12,
15
+ "seq_len": 512,
16
+ "n_views": 2,
17
+ "seed": 42,
18
+ "add_bos": true,
19
+ "add_eos": true,
20
+ "load_async": true,
21
+ "prefetch_size": 128,
22
+ "tokenizer": {
23
+ "name": "bytes",
24
+ "path": null
25
+ }
26
+ },
27
+ "optim": {
28
+ "lr": 0.0003,
29
+ "weight_decay": 0.1,
30
+ "epsilon": 1e-08,
31
+ "beta1": 0.9,
32
+ "beta2": 0.95,
33
+ "clip": 1.0,
34
+ "scheduler": "cosine",
35
+ "warmup": 1000,
36
+ "lr_min_ratio": 0.1,
37
+ "cycle_length": 1.0,
38
+ "cosine_theta": 1.0,
39
+ "annealing_step": 1000,
40
+ "decay_fraction": 0.1,
41
+ "exp_factor": 0.5
42
+ },
43
+ "model": {
44
+ "dim": 1024,
45
+ "n_layers": 24,
46
+ "head_dim": null,
47
+ "n_heads": 16,
48
+ "n_kv_heads": 16,
49
+ "ffn_dim_multiplier": null,
50
+ "multiple_of": 256,
51
+ "norm_eps": 1e-05,
52
+ "rope_theta": 10000.0,
53
+ "init_base_std": null,
54
+ "init_std_factor": "disabled",
55
+ "max_seqlen": 512,
56
+ "seed": 42,
57
+ "vocab_size": 258,
58
+ "weight_tying": false,
59
+ "sliding_window": null,
60
+ "n_loops": 8,
61
+ "shared_layers": true,
62
+ "loop_residual": true
63
+ },
64
+ "distributed": {
65
+ "dp_shard": 1,
66
+ "dp_replicate": 4,
67
+ "tp_size": 1,
68
+ "selective_activation_checkpointing": false,
69
+ "compile": true,
70
+ "fsdp_type": "full_shard",
71
+ "model_dtype": "bf16",
72
+ "float8_recipe": null,
73
+ "float8_filter": "layers\\.[0-9]+\\.",
74
+ "matmul_allow_tf32": true,
75
+ "detect_anomaly": false,
76
+ "compile_cache_size_limit": 8,
77
+ "spawn_method": "forkserver"
78
+ },
79
+ "env": {
80
+ "MKL_SERVICE_FORCE_INTEL": "GNU",
81
+ "OMP_NUM_THREADS": "1",
82
+ "MKL_NUM_THREADS": "1",
83
+ "ENABLE_INTRA_NODE_COMM": "1",
84
+ "TORCH_NCCL_AVOID_RECORD_STREAMS": "1",
85
+ "NCCL_IB_TIMEOUT": "22",
86
+ "NCCL_DEBUG": "INFO",
87
+ "TORCH_NCCL_ASYNC_ERROR_HANDLING": "1"
88
+ },
89
+ "checkpoint": {
90
+ "dump": {
91
+ "every": 500,
92
+ "keep": -1
93
+ },
94
+ "eval": {
95
+ "every": 1500000,
96
+ "keep": 3
97
+ },
98
+ "path": "/home/cd110/BFlowNet/apps/loopedLM/results/text2emoji/checkpoints",
99
+ "init_ckpt_path": null,
100
+ "continue_training_from_init": false
101
+ },
102
+ "profiling": {
103
+ "run": false,
104
+ "trace_folder": "profiling",
105
+ "mem_warmup": 100,
106
+ "mem_steps": 2,
107
+ "profile_warmup": 102,
108
+ "profile_steps": 2
109
+ },
110
+ "logging": {
111
+ "freq": 50,
112
+ "acc_freq": null,
113
+ "wandb": {
114
+ "job_type": null,
115
+ "dir": null,
116
+ "project": "looped_lm_text2emoji",
117
+ "entity": null,
118
+ "tags": null,
119
+ "group": null,
120
+ "name": "looped_lm_text2emoji",
121
+ "notes": null,
122
+ "config_exclude_keys": null,
123
+ "config_include_keys": null,
124
+ "anonymous": null,
125
+ "mode": null,
126
+ "allow_val_change": null,
127
+ "resume": null,
128
+ "force": null,
129
+ "tensorboard": null,
130
+ "sync_tensorboard": null,
131
+ "monitor_gym": null,
132
+ "save_code": null,
133
+ "id": null,
134
+ "fork_from": null,
135
+ "resume_from": null
136
+ }
137
+ },
138
+ "async_eval_gpus": null,
139
+ "eval": {
140
+ "generator": {
141
+ "max_tokens": 128,
142
+ "dtype": "bf16",
143
+ "temperature": 0.7,
144
+ "top_p": 0.9
145
+ },
146
+ "harness": {
147
+ "tasks": [
148
+ "hellaswag",
149
+ "piqa"
150
+ ]
151
+ },
152
+ "validation": {
153
+ "max_steps": 200
154
+ }
155
+ }
156
+ }
train_state_00000.json ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "step": 5000,
3
+ "acc_step": 0,
4
+ "data_loader_state": {
5
+ "it_state": {
6
+ "start_token": 21,
7
+ "it_state": {
8
+ "it_state": {
9
+ "root_dir": "/home/cd110/BFlowNet/apps/loopedLM/text2emoji_prepared",
10
+ "sources": {
11
+ "text2emoji": 1.0
12
+ },
13
+ "source_to_state": {
14
+ "text2emoji": {
15
+ "file_path": "/home/cd110/BFlowNet/apps/loopedLM/text2emoji_prepared/text2emoji/text2emoji.chunk.03.json",
16
+ "position": 15423808,
17
+ "block_size": 1,
18
+ "offset": 0,
19
+ "current_iter": 1
20
+ }
21
+ },
22
+ "rng_state": {
23
+ "bit_generator": "PCG64",
24
+ "state": {
25
+ "state": 205585754356582501350349259899939902810,
26
+ "inc": 11676600559890430755450356507027720041
27
+ },
28
+ "has_uint32": 0,
29
+ "uinteger": 0
30
+ }
31
+ },
32
+ "add_bos": true,
33
+ "add_eos": true,
34
+ "name": "bytes",
35
+ "path": null
36
+ },
37
+ "output_seq_len": 512,
38
+ "n_views": 2
39
+ },
40
+ "seq_idx": 8,
41
+ "rng_state": {
42
+ "bit_generator": "PCG64",
43
+ "state": {
44
+ "state": 324250618518055952288627465431916920177,
45
+ "inc": 77357518920597472829800677777012462921
46
+ },
47
+ "has_uint32": 1,
48
+ "uinteger": 85385168
49
+ },
50
+ "batch_size": 12,
51
+ "prefetch_size": 128
52
+ },
53
+ "scheduler": {
54
+ "base_lrs": [
55
+ 0.0003
56
+ ],
57
+ "last_epoch": 5000,
58
+ "verbose": false,
59
+ "_step_count": 5001,
60
+ "_get_lr_called_within_step": false,
61
+ "_last_lr": [
62
+ 3.039611684019504e-05
63
+ ],
64
+ "lr_lambdas": [
65
+ {}
66
+ ]
67
+ }
68
+ }
train_state_00001.json ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "step": 5000,
3
+ "acc_step": 0,
4
+ "data_loader_state": {
5
+ "it_state": {
6
+ "start_token": 74,
7
+ "it_state": {
8
+ "it_state": {
9
+ "root_dir": "/home/cd110/BFlowNet/apps/loopedLM/text2emoji_prepared",
10
+ "sources": {
11
+ "text2emoji": 1.0
12
+ },
13
+ "source_to_state": {
14
+ "text2emoji": {
15
+ "file_path": "/home/cd110/BFlowNet/apps/loopedLM/text2emoji_prepared/text2emoji/text2emoji.chunk.02.json",
16
+ "position": 15354030,
17
+ "block_size": 1,
18
+ "offset": 0,
19
+ "current_iter": 1
20
+ }
21
+ },
22
+ "rng_state": {
23
+ "bit_generator": "PCG64",
24
+ "state": {
25
+ "state": 46880898983298274906203770755158202503,
26
+ "inc": 239634081480473411747239400828488620799
27
+ },
28
+ "has_uint32": 0,
29
+ "uinteger": 0
30
+ }
31
+ },
32
+ "add_bos": true,
33
+ "add_eos": true,
34
+ "name": "bytes",
35
+ "path": null
36
+ },
37
+ "output_seq_len": 512,
38
+ "n_views": 2
39
+ },
40
+ "seq_idx": 8,
41
+ "rng_state": {
42
+ "bit_generator": "PCG64",
43
+ "state": {
44
+ "state": 319969186434683622935655786137931948242,
45
+ "inc": 270234035871729269002159329014059236425
46
+ },
47
+ "has_uint32": 0,
48
+ "uinteger": 2574191790
49
+ },
50
+ "batch_size": 12,
51
+ "prefetch_size": 128
52
+ },
53
+ "scheduler": {
54
+ "base_lrs": [
55
+ 0.0003
56
+ ],
57
+ "last_epoch": 5000,
58
+ "verbose": false,
59
+ "_step_count": 5001,
60
+ "_get_lr_called_within_step": false,
61
+ "_last_lr": [
62
+ 3.039611684019504e-05
63
+ ],
64
+ "lr_lambdas": [
65
+ {}
66
+ ]
67
+ }
68
+ }
train_state_00002.json ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "step": 5000,
3
+ "acc_step": 0,
4
+ "data_loader_state": {
5
+ "it_state": {
6
+ "start_token": 99,
7
+ "it_state": {
8
+ "it_state": {
9
+ "root_dir": "/home/cd110/BFlowNet/apps/loopedLM/text2emoji_prepared",
10
+ "sources": {
11
+ "text2emoji": 1.0
12
+ },
13
+ "source_to_state": {
14
+ "text2emoji": {
15
+ "file_path": "/home/cd110/BFlowNet/apps/loopedLM/text2emoji_prepared/text2emoji/text2emoji.chunk.00.json",
16
+ "position": 15348564,
17
+ "block_size": 1,
18
+ "offset": 0,
19
+ "current_iter": 1
20
+ }
21
+ },
22
+ "rng_state": {
23
+ "bit_generator": "PCG64",
24
+ "state": {
25
+ "state": 3069845261208554751547166343436132358,
26
+ "inc": 6027823433652931085739778990793808165
27
+ },
28
+ "has_uint32": 0,
29
+ "uinteger": 0
30
+ }
31
+ },
32
+ "add_bos": true,
33
+ "add_eos": true,
34
+ "name": "bytes",
35
+ "path": null
36
+ },
37
+ "output_seq_len": 512,
38
+ "n_views": 2
39
+ },
40
+ "seq_idx": 8,
41
+ "rng_state": {
42
+ "bit_generator": "PCG64",
43
+ "state": {
44
+ "state": 177769472111706612176032620089344751308,
45
+ "inc": 188564971970541749319992297790591572713
46
+ },
47
+ "has_uint32": 1,
48
+ "uinteger": 2736346968
49
+ },
50
+ "batch_size": 12,
51
+ "prefetch_size": 128
52
+ },
53
+ "scheduler": {
54
+ "base_lrs": [
55
+ 0.0003
56
+ ],
57
+ "last_epoch": 5000,
58
+ "verbose": false,
59
+ "_step_count": 5001,
60
+ "_get_lr_called_within_step": false,
61
+ "_last_lr": [
62
+ 3.039611684019504e-05
63
+ ],
64
+ "lr_lambdas": [
65
+ {}
66
+ ]
67
+ }
68
+ }
train_state_00003.json ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "step": 5000,
3
+ "acc_step": 0,
4
+ "data_loader_state": {
5
+ "it_state": {
6
+ "start_token": 100,
7
+ "it_state": {
8
+ "it_state": {
9
+ "root_dir": "/home/cd110/BFlowNet/apps/loopedLM/text2emoji_prepared",
10
+ "sources": {
11
+ "text2emoji": 1.0
12
+ },
13
+ "source_to_state": {
14
+ "text2emoji": {
15
+ "file_path": "/home/cd110/BFlowNet/apps/loopedLM/text2emoji_prepared/text2emoji/text2emoji.chunk.01.json",
16
+ "position": 15387317,
17
+ "block_size": 1,
18
+ "offset": 0,
19
+ "current_iter": 1
20
+ }
21
+ },
22
+ "rng_state": {
23
+ "bit_generator": "PCG64",
24
+ "state": {
25
+ "state": 81131458415525599535785437639948652948,
26
+ "inc": 92941856108932518968286621281627530405
27
+ },
28
+ "has_uint32": 0,
29
+ "uinteger": 0
30
+ }
31
+ },
32
+ "add_bos": true,
33
+ "add_eos": true,
34
+ "name": "bytes",
35
+ "path": null
36
+ },
37
+ "output_seq_len": 512,
38
+ "n_views": 2
39
+ },
40
+ "seq_idx": 8,
41
+ "rng_state": {
42
+ "bit_generator": "PCG64",
43
+ "state": {
44
+ "state": 286960010946238495423822789291240034500,
45
+ "inc": 66050176413739185524746886687120723265
46
+ },
47
+ "has_uint32": 1,
48
+ "uinteger": 1701660961
49
+ },
50
+ "batch_size": 12,
51
+ "prefetch_size": 128
52
+ },
53
+ "scheduler": {
54
+ "base_lrs": [
55
+ 0.0003
56
+ ],
57
+ "last_epoch": 5000,
58
+ "verbose": false,
59
+ "_step_count": 5001,
60
+ "_get_lr_called_within_step": false,
61
+ "_last_lr": [
62
+ 3.039611684019504e-05
63
+ ],
64
+ "lr_lambdas": [
65
+ {}
66
+ ]
67
+ }
68
+ }