Models comparing the effects of tokenizer properties on pre-training compression, and its relationship with downstream performance.
-
shikhar-srivastava/llama-130m-prenorm-train_c4-2B-tok_t5base
Updated • 8 -
shikhar-srivastava/llama-130m-prenorm-train_c4_2B-tok_llama2
0.1B • Updated • 4 -
shikhar-srivastava/mono_gold_130m_pre_lr1e-4_eng_latn_bpe_unscaled_8192
97.5M • Updated • 2 -
shikhar-srivastava/mono_gold_130m_pre_lr1e-4_eng_latn_unigram_unscaled_8192
97.5M • Updated • 2