--- library_name: transformers license: apache-2.0 base_model: - nbeerbower/mistral-nemo-kartoffel-12B datasets: - nbeerbower/Schule-DPO - nbeerbower/Purpura-DPO - nbeerbower/Arkhaios-DPO - jondurbin/truthy-dpo-v0.1 - antiven0m/physical-reasoning-dpo - Atsunori/HelpSteer2-DPO - GeneralReasoning/GeneralThought-430K - nvidia/OpenMathReasoning - nvidia/OpenCodeReasoning tags: - orpo - uncensored - reasoning - chain-of-thought - qlora - experimental --- > 🧪 **Experimental Model** > > This is one of many experimental iterations I'm sharing publicly while I mess around with training parameters and ideas. It's not a "real" release - just me being transparent about my learning process. Feel free to look under the hood, but don't expect anything production-ready! ![image/png](https://huggingface.co/nbeerbower/Denker-mistral-nemo-12B/resolve/main/denker_cover.png?download=true) # Denker-mistral-nemo-12B **Denker** is a small, uncensored, reasoning-focused model finetuned using [ORPO and QLoRA](https://huggingface.co/blog/mlabonne/orpo-llama-3) on top of [mistral-nemo-kartoffel-12B](https://huggingface.co/nbeerbower/mistral-nemo-kartoffel-12B). This run experiments with the Qwen-style chat template and `...`-style reasoning structure—without modifying the base vocab. All tuning was done via LoRA. ## Finetuning Details - **Method:** ORPO - **Epochs:** 0.25 - **Learning Rate:** 8e-6, cosine decay w/ 5% warmup - **Batch Size:** 1 x 64 (64 effective) - **Max Grad Norm:** 0.5 - **LoRA Rank:** 128 - **Hardware:** 1x NVIDIA RTX A6000 ## Dataset Composition Thinking disabled: * [nbeerbower/Schule-DPO](https://huggingface.co/datasets/nbeerbower/Schule-DPO) * [nbeerbower/Purpura-DPO](https://huggingface.co/datasets/nbeerbower/Purpura-DPO) * [nbeerbower/Arkhaios-DPO](https://huggingface.co/datasets/nbeerbower/Arkhaios-DPO) * [jondurbin/truthy-dpo-v0.1](https://huggingface.co/datasets/jondurbin/truthy-dpo-v0.1) * [antiven0m/physical-reasoning-dpo](https://huggingface.co/datasets/antiven0m/physical-reasoning-dpo) * [Atsunori/HelpSteer2-DPO](https://huggingface.co/datasets/Atsunori/HelpSteer2-DPO) ### Chain of Thought 30,000 samples of each dataset with thinking enabled. * [GeneralReasoning/GeneralThought-430K](https://huggingface.co/datasets/GeneralReasoning/GeneralThought-430K) * [nvidia/OpenMathReasoning](https://huggingface.co/datasets/nvidia/OpenMathReasoning) * [nvidia/OpenCodeReasoning](https://huggingface.co/datasets/nvidia/OpenCodeReasoning) ## Results ### Observations The model will sometimes decide not to think. ### Evals TBD