This is a storywriting and roleplay model with a significant amount of self generated long context multiturn roleplay. I downloaded a bit under a thousand cards from chub.ai, and created a synthetic roleplay for each card. I batched as many turns as I could in 4k token chunks in order to maintain coherency over longer context. There was a lot of cleaning and validation between each batch, so a lot of examples were "lost," but the final output seems to be very good quality. The longest conversation is about 20k tokens, and I plan to extend this further as well as broaden the dataset with more examples. The first 4k tokens were generated with Command-R-Plus, with the remainder generated with byroneverson/Mistral-Small-Instruct-2409-abliterated. Next, I downloaded the prompt backup from this site, and used them as a seed for some storywriting data: https://aetherroom.club/whats-new#backup-update I went over it twice with Command-R-Plus. The first time, having it basically write the first draft of the output, the second improving and extending the length of the original output. Also included was a subset of the following datasets: anthracite-org/stheno-filtered-v1.1 anthracite-org/kalo_misc_part2 anthracite-org/kalo_opus_misc_240827 anthracite-org/kalo-opus-instruct-22k-no-refusal Chaser-cz/sonnet35-charcard-roleplay-sharegpt (A very small subset) jondurbin/airoboros-3.2 And some various other data, viewable at openerotica/mixed-rp Every line of data was run through a large model in order to filter for low quality, repetition, and underage content. [Built with Axolotl](https://github.com/axolotl-ai-cloud/axolotl) ```yaml base_model: mistralai/Mistral-Nemo-Base-2407 model_type: AutoModelForCausalLM tokenizer_type: AutoTokenizer load_in_8bit: false load_in_4bit: true strict: false datasets: - path: openerotica/mixed-rp type: sharegpt conversation: chatml chat_template: chatml adapter: qlora lora_r: 128 lora_alpha: 256 lora_modules_to_save: [embed_tokens, lm_head] lora_dropout: 0.05 lora_target_linear: true lora_target_modules: - gate_proj - down_proj - up_proj - q_proj - v_proj - k_proj - o_proj dataset_prepared_path: val_set_size: 0.01 output_dir: /workspace/axolotl/mixed-rp-mistral-nemo sequence_len: 20000 sample_packing: true pad_to_sequence_len: true wandb_project: mistral-2 wandb_watch: wandb_run_id: wandb_log_model: gradient_accumulation_steps: 2 micro_batch_size: 1 num_epochs: 1 optimizer: adamw_torch lr_scheduler: cosine learning_rate: 1e-5 train_on_inputs: false group_by_length: false bf16: auto fp16: tf32: false gradient_checkpointing: true gradient_checkpointing_kwargs: use_reentrant: false early_stopping_patience: resume_from_checkpoint: logging_steps: 1 xformers_attention: flash_attention: true warmup_steps: 100 evals_per_epoch: 4 eval_table_size: saves_per_epoch: 1 save_total_limit: 2 save_steps: debug: deepspeed: weight_decay: 0.1 special_tokens: eos_token: "<|im_end|>" pad_token: "" bos_token: "" unk_token: "" tokens: - "<|im_start|>" # fsdp: # - full_shard # - auto_wrap # fsdp_config: # fsdp_limit_all_gathers: true # fsdp_sync_module_states: true # fsdp_offload_params: true # fsdp_use_orig_params: false # fsdp_cpu_ram_efficient_loading: true # fsdp_transformer_layer_cls_to_wrap: MixtralSparseMoeBlock # fsdp_state_dict_type: FULL_STATE_DICT # fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP # fsdp_sharding_strategy: FULL_SHARD # fsdp_forward_prefetch: false # fsdp_backward_prefetch: BACKWARD_PRE