--- library_name: transformers pipeline_tag: text-generation tags: [gpt-bert, babylm, remote-code] license: other --- # haznitrama/babybabellm-multi_gpu-gpt_bert-eng-main-causal GPT-BERT style BabyBabyLLM model for language **eng**. This repository may include both *main* and *EMA* variants. **Default variant exposed to generic loaders:** `main` ## Variants Available main ## Files - model.safetensors (alias of default variant) ## Configuration ```json { "attention_probs_dropout_prob": 0.1, "hidden_dropout_prob": 0.1, "hidden_size": 768, "intermediate_size": 2560, "max_position_embeddings": 512, "position_bucket_size": 32, "num_attention_heads": 12, "num_hidden_layers": 12, "vocab_size": 16384, "layer_norm_eps": 1e-05 } ``` Tokenizer file: `tokenizer_eng.json` ## Quick Usage ```python from transformers import AutoTokenizer, AutoModelForMaskedLM model_id = 'haznitrama/babybabellm-multi_gpu-gpt_bert-eng-main-causal' tok = AutoTokenizer.from_pretrained(model_id) model = AutoModelForMaskedLM.from_pretrained(model_id, trust_remote_code=True) out = model(**tok('Hello world', return_tensors='pt')) ``` ### Causal LM Wrapper This repo includes a lightweight GPTBertForCausalLM wrapper. Generation example: ```python from transformers import AutoTokenizer, AutoModelForCausalLM mid='haznitrama/babybabellm-multi_gpu-gpt_bert-eng-main-causal' tok=AutoTokenizer.from_pretrained(mid) model=AutoModelForCausalLM.from_pretrained(mid, trust_remote_code=True) print(tok.decode(model.generate(**tok('Hello', return_tensors='pt'), max_new_tokens=20)[0], skip_special_tokens=True)) ``` ## Notes - Converted on 2025-09-27T15:21:53.977598+00:00 - Weights are the exact trained parameters; no new layers were initialized. - Requires `trust_remote_code=True` due to custom architecture.