--- library_name: transformers pipeline_tag: fill-mask tags: [gpt-bert, babylm, remote-code] license: other --- # haznitrama/babybabellm-gpt_bert-nso-main GPT-BERT style BabyBabyLLM model for language **nso**. This repository may include both *main* and *EMA* variants. **Default variant exposed to generic loaders:** `main` ## Variants Available main ## Files - model.safetensors (alias of default variant) ## Configuration ```json { "attention_probs_dropout_prob": 0.1, "hidden_dropout_prob": 0.1, "hidden_size": 384, "intermediate_size": 1280, "max_position_embeddings": 512, "position_bucket_size": 32, "num_attention_heads": 6, "num_hidden_layers": 12, "vocab_size": 8192, "layer_norm_eps": 1e-05 } ``` Tokenizer file: `tokenizer_nso_vs8192.json` ## Quick Usage ```python from transformers import AutoTokenizer, AutoModelForMaskedLM model_id = 'haznitrama/babybabellm-gpt_bert-nso-main' tok = AutoTokenizer.from_pretrained(model_id) model = AutoModelForMaskedLM.from_pretrained(model_id, trust_remote_code=True) out = model(**tok('Hello world', return_tensors='pt')) ``` ## Notes - Converted on 2025-09-28T11:11:41.835362+00:00 - Weights are the exact trained parameters; no new layers were initialized. - Requires `trust_remote_code=True` due to custom architecture.