Update README.md
Browse files
README.md
CHANGED
|
@@ -24,21 +24,56 @@ As some people have told us our models are sloppy, Ikari decided to say fuck it
|
|
| 24 |
|
| 25 |
Our dataset stayed the same since day one, we added data over time, cleaned them, and repeat. After not releasing model for a while because we were never satisfied, we think it's time to come back!
|
| 26 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 27 |
|
| 28 |
## Credits:
|
| 29 |
- Undi
|
| 30 |
- IkariDev
|
| 31 |
|
| 32 |
-
## Training data used:
|
| 33 |
-
We will point out all dataset we used here, please be patient the time we get them all back kek.
|
| 34 |
|
| 35 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 36 |
|
| 37 |
-
|
| 38 |
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 42 |
|
| 43 |
## Others
|
| 44 |
|
|
|
|
| 24 |
|
| 25 |
Our dataset stayed the same since day one, we added data over time, cleaned them, and repeat. After not releasing model for a while because we were never satisfied, we think it's time to come back!
|
| 26 |
|
| 27 |
+
# Prompt template: Mistral
|
| 28 |
+
|
| 29 |
+
```
|
| 30 |
+
<s>[INST] {input} [/INST] {output}</s>
|
| 31 |
+
```
|
| 32 |
|
| 33 |
## Credits:
|
| 34 |
- Undi
|
| 35 |
- IkariDev
|
| 36 |
|
| 37 |
+
## Training data we used to make our dataset:
|
|
|
|
| 38 |
|
| 39 |
+
- [Epiculous/Gnosis](https://huggingface.co/Epiculous/Gnosis)
|
| 40 |
+
- [ChaoticNeutrals/Luminous_Opus](https://huggingface.co/datasets/ChaoticNeutrals/Luminous_Opus)
|
| 41 |
+
- [ChaoticNeutrals/Synthetic-Dark-RP](https://huggingface.co/datasets/ChaoticNeutrals/Synthetic-Dark-RP)
|
| 42 |
+
- [ChaoticNeutrals/Synthetic-RP](https://huggingface.co/datasets/ChaoticNeutrals/Synthetic-RP)
|
| 43 |
+
- [Gryphe/Sonnet3.5-SlimOrcaDedupCleaned](https://huggingface.co/datasets/Gryphe/Sonnet3.5-SlimOrcaDedupCleaned)
|
| 44 |
+
- [Gryphe/Opus-WritingPrompts](https://huggingface.co/datasets/Gryphe/Opus-WritingPrompts)
|
| 45 |
+
- [meseca/writing-opus-6k](https://huggingface.co/datasets/meseca/writing-opus-6k)
|
| 46 |
+
- [meseca/opus-instruct-9k](https://huggingface.co/datasets/meseca/opus-instruct-9k)
|
| 47 |
+
- [PJMixers/grimulkan_theory-of-mind-ShareGPT](https://huggingface.co/datasets/PJMixers/grimulkan_theory-of-mind-ShareGPT)
|
| 48 |
+
- [NobodyExistsOnTheInternet/ToxicQAFinal](https://huggingface.co/datasets/NobodyExistsOnTheInternet/ToxicQAFinal)
|
| 49 |
+
- [Undi95/toxic-dpo-v0.1-sharegpt](https://huggingface.co/datasets/Undi95/toxic-dpo-v0.1-sharegpt)
|
| 50 |
+
- [cgato/SlimOrcaDedupCleaned](https://huggingface.co/datasets/cgato/SlimOrcaDedupCleaned)
|
| 51 |
+
- [kalomaze/Opus_Instruct_25k](https://huggingface.co/datasets/kalomaze/Opus_Instruct_25k)
|
| 52 |
+
- [Doctor-Shotgun/no-robots-sharegpt](https://huggingface.co/datasets/Doctor-Shotgun/no-robots-sharegpt)
|
| 53 |
+
- [Norquinal/claude_multiround_chat_30k](https://huggingface.co/datasets/Norquinal/claude_multiround_chat_30k)
|
| 54 |
+
- [nothingiisreal/Claude-3-Opus-Instruct-15K](https://huggingface.co/datasets/nothingiisreal/Claude-3-Opus-Instruct-15K)
|
| 55 |
+
- All the Aesirs dataset, cleaned, unslopped
|
| 56 |
+
- All le luminae dataset, cleaned, unslopped
|
| 57 |
+
- Small part of Airoboros reduced
|
| 58 |
|
| 59 |
+
We sadly didn't find the sources of the following, DM us if you recognize your set !
|
| 60 |
|
| 61 |
+
- Opus_Instruct-v2-6.5K-Filtered-v2-sharegpt
|
| 62 |
+
- claude_sharegpt_trimmed
|
| 63 |
+
- CapybaraPure_Decontaminated-ShareGPT_reduced
|
| 64 |
+
|
| 65 |
+
## Datasets credits:
|
| 66 |
+
- Epiculous
|
| 67 |
+
- ChaoticNeutrals
|
| 68 |
+
- Gryphe
|
| 69 |
+
- meseca
|
| 70 |
+
- PJMixers
|
| 71 |
+
- NobodyExistsOnTheInternet
|
| 72 |
+
- cgato
|
| 73 |
+
- kalomaze
|
| 74 |
+
- Doctor-Shotgun
|
| 75 |
+
- Norquinal
|
| 76 |
+
- nothingiisreal
|
| 77 |
|
| 78 |
## Others
|
| 79 |
|