Upload EmojiLM - Byte-level Looped Transformer

Browse files

Files changed (7) hide show

README.md +140 -0
consolidated.pth +3 -0
params.json +156 -0
train_state_00000.json +68 -0
train_state_00001.json +68 -0
train_state_00002.json +68 -0
train_state_00003.json +68 -0

README.md ADDED Viewed

	@@ -0,0 +1,140 @@

+---
+language:
+- en
+license: apache-2.0
+tags:
+- text-generation
+- emoji
+- byte-level
+- looped-transformer
+- text2emoji
+datasets:
+- KomeijiForce/Text2Emoji
+---
+# EmojiLM: Byte-Level Looped Transformer for Text-to-Emoji Translation
+This model is a byte-level language model trained with a **looped transformer** architecture for translating text descriptions to emojis.
+## Model Description
+- **Model Type:** Causal Language Model with Looped Transformer Architecture
+- **Task:** Text-to-Emoji Translation
+- **Training Data:** KomeijiForce/Text2Emoji dataset (500k+ text-emoji pairs)
+- **Tokenizer:** Byte-level (vocab size: 258)
+### Architecture Details
+**Looped Transformer Architecture:**
+- **Base Layers:** 24
+- **Number of Loops:** 8 (layers are applied iteratively)
+- **Shared Layers:** True (parameter efficient)
+- **Loop Residual:** True (residual connections across loops)
+**Model Dimensions:**
+- **Hidden Dimension:** 1024
+- **Number of Attention Heads:** 16
+- **KV Heads:** 16
+- **Max Sequence Length:** 512
+- **RoPE Theta:** 10000.0
+### Training Configuration
+- **Training Steps:** 5100
+- **Batch Size:** 12
+- **Sequence Length:** 512
+- **Learning Rate:** 0.0003
+- **Warmup Steps:** 1000
+- **Optimizer:** AdamW (β1=0.9, β2=0.95)
+- **LR Scheduler:** Cosine with min ratio 0.1
+- **Gradient Clipping:** 1.0
+- **Weight Decay:** 0.1
+- **Precision:** BF16
+## What is a Looped Transformer?
+A looped transformer applies the same transformer layers multiple times in an iterative refinement process.
+This is particularly effective for translation tasks as it allows the model to:
+- Refine predictions through multiple iterations
+- Use parameters more efficiently (shared weights across loops)
+- Model complex input-output mappings with fewer total parameters
+In this model, 24 layers are applied 8 times with residual connections between loops.
+## Intended Use
+This model is designed to translate text descriptions into appropriate emojis.
+**Example Usage:**
+```
+Input: "I love pizza"
+Output: "🍕❤️"
+```
+## Training Data
+The model was trained on the **KomeijiForce/Text2Emoji** dataset, which contains over 500,000 text-emoji pairs.
+## Model Files
+This repository contains:
+- `consolidated.pth`: PyTorch model weights
+- `params.json`: Complete model and training configuration
+- `train_state_*.json`: Training state information from checkpoint
+## Usage
+To use this model, you'll need the original BFlowNet/loopedLM codebase to load the architecture:
+```python
+import torch
+import json
+# Load model parameters
+with open('params.json', 'r') as f:
+    params = json.load(f)
+# Load model weights
+checkpoint = torch.load('consolidated.pth', map_location='cpu')
+# Initialize model with your BFlowNet loopedLM architecture
+# from apps.loopedLM import LoopedTransformer
+# model = LoopedTransformer(**params['model'])
+# model.load_state_dict(checkpoint)
+```
+### Generation Parameters
+For best results, use:
+- **Max Tokens:** 128 (outputs are typically short)
+- **Temperature:** 0.7 (for diverse emoji selection)
+- **Top-p:** 0.9
+## Limitations
+- The model uses a byte-level tokenizer, which works well for emojis but may be less efficient than subword tokenization for general text
+- Performance is optimized for text-to-emoji translation and may not generalize well to other tasks
+- The model requires the specific looped transformer architecture implementation to load and use
+## Citation
+If you use this model, please cite:
+```bibtex
+@misc{emojilm-looped-transformer,
+  title={EmojiLM: Byte-Level Looped Transformer for Text-to-Emoji Translation},
+  author={Your Name},
+  year={2025},
+  howpublished={\url{https://huggingface.co/YOUR-USERNAME/emojilm-looped-transformer}}
+}
+```
+## Training Framework
+This model was trained using the BFlowNet framework with looped transformer architecture.
+Dataset: [KomeijiForce/Text2Emoji](https://huggingface.co/datasets/KomeijiForce/Text2Emoji)
+## License
+Apache 2.0

consolidated.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f0589ff488e532cbb87a48b18a9aa53a468a3c907ad839b62ad248a434263457
+size 1853427530

params.json ADDED Viewed

	@@ -0,0 +1,156 @@

+{
+  "name": "looped_lm_text2emoji",
+  "dump_dir": "/home/cd110/BFlowNet/apps/loopedLM/results/text2emoji",
+  "seed": 42,
+  "grad_acc_steps": 1,
+  "gc_collect_freq": 1000,
+  "probe_freq": null,
+  "steps": 5100,
+  "data": {
+    "root_dir": "/home/cd110/BFlowNet/apps/loopedLM/text2emoji_prepared",
+    "sources": {
+      "text2emoji": 1.0
+    },
+    "batch_size": 12,
+    "seq_len": 512,
+    "n_views": 2,
+    "seed": 42,
+    "add_bos": true,
+    "add_eos": true,
+    "load_async": true,
+    "prefetch_size": 128,
+    "tokenizer": {
+      "name": "bytes",
+      "path": null
+    }
+  },
+  "optim": {
+    "lr": 0.0003,
+    "weight_decay": 0.1,
+    "epsilon": 1e-08,
+    "beta1": 0.9,
+    "beta2": 0.95,
+    "clip": 1.0,
+    "scheduler": "cosine",
+    "warmup": 1000,
+    "lr_min_ratio": 0.1,
+    "cycle_length": 1.0,
+    "cosine_theta": 1.0,
+    "annealing_step": 1000,
+    "decay_fraction": 0.1,
+    "exp_factor": 0.5
+  },
+  "model": {
+    "dim": 1024,
+    "n_layers": 24,
+    "head_dim": null,
+    "n_heads": 16,
+    "n_kv_heads": 16,
+    "ffn_dim_multiplier": null,
+    "multiple_of": 256,
+    "norm_eps": 1e-05,
+    "rope_theta": 10000.0,
+    "init_base_std": null,
+    "init_std_factor": "disabled",
+    "max_seqlen": 512,
+    "seed": 42,
+    "vocab_size": 258,
+    "weight_tying": false,
+    "sliding_window": null,
+    "n_loops": 8,
+    "shared_layers": true,
+    "loop_residual": true
+  },
+  "distributed": {
+    "dp_shard": 1,
+    "dp_replicate": 4,
+    "tp_size": 1,
+    "selective_activation_checkpointing": false,
+    "compile": true,
+    "fsdp_type": "full_shard",
+    "model_dtype": "bf16",
+    "float8_recipe": null,
+    "float8_filter": "layers\\.[0-9]+\\.",
+    "matmul_allow_tf32": true,
+    "detect_anomaly": false,
+    "compile_cache_size_limit": 8,
+    "spawn_method": "forkserver"
+  },
+  "env": {
+    "MKL_SERVICE_FORCE_INTEL": "GNU",
+    "OMP_NUM_THREADS": "1",
+    "MKL_NUM_THREADS": "1",
+    "ENABLE_INTRA_NODE_COMM": "1",
+    "TORCH_NCCL_AVOID_RECORD_STREAMS": "1",
+    "NCCL_IB_TIMEOUT": "22",
+    "NCCL_DEBUG": "INFO",
+    "TORCH_NCCL_ASYNC_ERROR_HANDLING": "1"
+  },
+  "checkpoint": {
+    "dump": {
+      "every": 500,
+      "keep": -1
+    },
+    "eval": {
+      "every": 1500000,
+      "keep": 3
+    },
+    "path": "/home/cd110/BFlowNet/apps/loopedLM/results/text2emoji/checkpoints",
+    "init_ckpt_path": null,
+    "continue_training_from_init": false
+  },
+  "profiling": {
+    "run": false,
+    "trace_folder": "profiling",
+    "mem_warmup": 100,
+    "mem_steps": 2,
+    "profile_warmup": 102,
+    "profile_steps": 2
+  },
+  "logging": {
+    "freq": 50,
+    "acc_freq": null,
+    "wandb": {
+      "job_type": null,
+      "dir": null,
+      "project": "looped_lm_text2emoji",
+      "entity": null,
+      "tags": null,
+      "group": null,
+      "name": "looped_lm_text2emoji",
+      "notes": null,
+      "config_exclude_keys": null,
+      "config_include_keys": null,
+      "anonymous": null,
+      "mode": null,
+      "allow_val_change": null,
+      "resume": null,
+      "force": null,
+      "tensorboard": null,
+      "sync_tensorboard": null,
+      "monitor_gym": null,
+      "save_code": null,
+      "id": null,
+      "fork_from": null,
+      "resume_from": null
+    }
+  },
+  "async_eval_gpus": null,
+  "eval": {
+    "generator": {
+      "max_tokens": 128,
+      "dtype": "bf16",
+      "temperature": 0.7,
+      "top_p": 0.9
+    },
+    "harness": {
+      "tasks": [
+        "hellaswag",
+        "piqa"
+      ]
+    },
+    "validation": {
+      "max_steps": 200
+    }
+  }
+}

train_state_00000.json ADDED Viewed

	@@ -0,0 +1,68 @@

+{
+  "step": 5000,
+  "acc_step": 0,
+  "data_loader_state": {
+    "it_state": {
+      "start_token": 21,
+      "it_state": {
+        "it_state": {
+          "root_dir": "/home/cd110/BFlowNet/apps/loopedLM/text2emoji_prepared",
+          "sources": {
+            "text2emoji": 1.0
+          },
+          "source_to_state": {
+            "text2emoji": {
+              "file_path": "/home/cd110/BFlowNet/apps/loopedLM/text2emoji_prepared/text2emoji/text2emoji.chunk.03.json",
+              "position": 15423808,
+              "block_size": 1,
+              "offset": 0,
+              "current_iter": 1
+            }
+          },
+          "rng_state": {
+            "bit_generator": "PCG64",
+            "state": {
+              "state": 205585754356582501350349259899939902810,
+              "inc": 11676600559890430755450356507027720041
+            },
+            "has_uint32": 0,
+            "uinteger": 0
+          }
+        },
+        "add_bos": true,
+        "add_eos": true,
+        "name": "bytes",
+        "path": null
+      },
+      "output_seq_len": 512,
+      "n_views": 2
+    },
+    "seq_idx": 8,
+    "rng_state": {
+      "bit_generator": "PCG64",
+      "state": {
+        "state": 324250618518055952288627465431916920177,
+        "inc": 77357518920597472829800677777012462921
+      },
+      "has_uint32": 1,
+      "uinteger": 85385168
+    },
+    "batch_size": 12,
+    "prefetch_size": 128
+  },
+  "scheduler": {
+    "base_lrs": [
+      0.0003
+    ],
+    "last_epoch": 5000,
+    "verbose": false,
+    "_step_count": 5001,
+    "_get_lr_called_within_step": false,
+    "_last_lr": [
+      3.039611684019504e-05
+    ],
+    "lr_lambdas": [
+      {}
+    ]
+  }
+}

train_state_00001.json ADDED Viewed

	@@ -0,0 +1,68 @@

+{
+  "step": 5000,
+  "acc_step": 0,
+  "data_loader_state": {
+    "it_state": {
+      "start_token": 74,
+      "it_state": {
+        "it_state": {
+          "root_dir": "/home/cd110/BFlowNet/apps/loopedLM/text2emoji_prepared",
+          "sources": {
+            "text2emoji": 1.0
+          },
+          "source_to_state": {
+            "text2emoji": {
+              "file_path": "/home/cd110/BFlowNet/apps/loopedLM/text2emoji_prepared/text2emoji/text2emoji.chunk.02.json",
+              "position": 15354030,
+              "block_size": 1,
+              "offset": 0,
+              "current_iter": 1
+            }
+          },
+          "rng_state": {
+            "bit_generator": "PCG64",
+            "state": {
+              "state": 46880898983298274906203770755158202503,
+              "inc": 239634081480473411747239400828488620799
+            },
+            "has_uint32": 0,
+            "uinteger": 0
+          }
+        },
+        "add_bos": true,
+        "add_eos": true,
+        "name": "bytes",
+        "path": null
+      },
+      "output_seq_len": 512,
+      "n_views": 2
+    },
+    "seq_idx": 8,
+    "rng_state": {
+      "bit_generator": "PCG64",
+      "state": {
+        "state": 319969186434683622935655786137931948242,
+        "inc": 270234035871729269002159329014059236425
+      },
+      "has_uint32": 0,
+      "uinteger": 2574191790
+    },
+    "batch_size": 12,
+    "prefetch_size": 128
+  },
+  "scheduler": {
+    "base_lrs": [
+      0.0003
+    ],
+    "last_epoch": 5000,
+    "verbose": false,
+    "_step_count": 5001,
+    "_get_lr_called_within_step": false,
+    "_last_lr": [
+      3.039611684019504e-05
+    ],
+    "lr_lambdas": [
+      {}
+    ]
+  }
+}

train_state_00002.json ADDED Viewed

	@@ -0,0 +1,68 @@

+{
+  "step": 5000,
+  "acc_step": 0,
+  "data_loader_state": {
+    "it_state": {
+      "start_token": 99,
+      "it_state": {
+        "it_state": {
+          "root_dir": "/home/cd110/BFlowNet/apps/loopedLM/text2emoji_prepared",
+          "sources": {
+            "text2emoji": 1.0
+          },
+          "source_to_state": {
+            "text2emoji": {
+              "file_path": "/home/cd110/BFlowNet/apps/loopedLM/text2emoji_prepared/text2emoji/text2emoji.chunk.00.json",
+              "position": 15348564,
+              "block_size": 1,
+              "offset": 0,
+              "current_iter": 1
+            }
+          },
+          "rng_state": {
+            "bit_generator": "PCG64",
+            "state": {
+              "state": 3069845261208554751547166343436132358,
+              "inc": 6027823433652931085739778990793808165
+            },
+            "has_uint32": 0,
+            "uinteger": 0
+          }
+        },
+        "add_bos": true,
+        "add_eos": true,
+        "name": "bytes",
+        "path": null
+      },
+      "output_seq_len": 512,
+      "n_views": 2
+    },
+    "seq_idx": 8,
+    "rng_state": {
+      "bit_generator": "PCG64",
+      "state": {
+        "state": 177769472111706612176032620089344751308,
+        "inc": 188564971970541749319992297790591572713
+      },
+      "has_uint32": 1,
+      "uinteger": 2736346968
+    },
+    "batch_size": 12,
+    "prefetch_size": 128
+  },
+  "scheduler": {
+    "base_lrs": [
+      0.0003
+    ],
+    "last_epoch": 5000,
+    "verbose": false,
+    "_step_count": 5001,
+    "_get_lr_called_within_step": false,
+    "_last_lr": [
+      3.039611684019504e-05
+    ],
+    "lr_lambdas": [
+      {}
+    ]
+  }
+}

train_state_00003.json ADDED Viewed

	@@ -0,0 +1,68 @@

+{
+  "step": 5000,
+  "acc_step": 0,
+  "data_loader_state": {
+    "it_state": {
+      "start_token": 100,
+      "it_state": {
+        "it_state": {
+          "root_dir": "/home/cd110/BFlowNet/apps/loopedLM/text2emoji_prepared",
+          "sources": {
+            "text2emoji": 1.0
+          },
+          "source_to_state": {
+            "text2emoji": {
+              "file_path": "/home/cd110/BFlowNet/apps/loopedLM/text2emoji_prepared/text2emoji/text2emoji.chunk.01.json",
+              "position": 15387317,
+              "block_size": 1,
+              "offset": 0,
+              "current_iter": 1
+            }
+          },
+          "rng_state": {
+            "bit_generator": "PCG64",
+            "state": {
+              "state": 81131458415525599535785437639948652948,
+              "inc": 92941856108932518968286621281627530405
+            },
+            "has_uint32": 0,
+            "uinteger": 0
+          }
+        },
+        "add_bos": true,
+        "add_eos": true,
+        "name": "bytes",
+        "path": null
+      },
+      "output_seq_len": 512,
+      "n_views": 2
+    },
+    "seq_idx": 8,
+    "rng_state": {
+      "bit_generator": "PCG64",
+      "state": {
+        "state": 286960010946238495423822789291240034500,
+        "inc": 66050176413739185524746886687120723265
+      },
+      "has_uint32": 1,
+      "uinteger": 1701660961
+    },
+    "batch_size": 12,
+    "prefetch_size": 128
+  },
+  "scheduler": {
+    "base_lrs": [
+      0.0003
+    ],
+    "last_epoch": 5000,
+    "verbose": false,
+    "_step_count": 5001,
+    "_get_lr_called_within_step": false,
+    "_last_lr": [
+      3.039611684019504e-05
+    ],
+    "lr_lambdas": [
+      {}
+    ]
+  }
+}