Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
This is a storywriting and roleplay model with a significant amount of self generated long context multiturn roleplay.
|
| 2 |
+
|
| 3 |
+
I downloaded a bit under a thousand cards from chub.ai, and created a synthetic roleplay for each card. I batched as many turns as I could in 4k token chunks in order to maintain coherency over longer context. There was a lot of cleaning and validation between each batch, so a lot of examples were "lost," but the final output seems to be very good quality. The longest conversation is about 20k tokens, and I plan to extend this further as well as broaden the dataset with more examples. The first 4k tokens were generated with Command-R-Plus, with the remainder generated with byroneverson/Mistral-Small-Instruct-2409-abliterated.
|
| 4 |
+
|
| 5 |
+
Next, I downloaded the prompt backup from this site, and used them as a seed for some storywriting data:
|
| 6 |
+
|
| 7 |
+
https://aetherroom.club/whats-new#backup-update
|
| 8 |
+
|
| 9 |
+
I went over it twice with Command-R-Plus. The first time, having it basically write the first draft of the output, the second improving and extending the length of the original output.
|
| 10 |
+
|
| 11 |
+
Also included was a subset of the following datasets:
|
| 12 |
+
|
| 13 |
+
anthracite-org/stheno-filtered-v1.1
|
| 14 |
+
anthracite-org/kalo_misc_part2
|
| 15 |
+
anthracite-org/kalo_opus_misc_240827
|
| 16 |
+
anthracite-org/kalo-opus-instruct-22k-no-refusal
|
| 17 |
+
Chaser-cz/sonnet35-charcard-roleplay-sharegpt
|
| 18 |
+
(A very small subset) jondurbin/airoboros-3.2
|
| 19 |
+
And some various other data, viewable at openerotica/mixed-rp
|
| 20 |
+
|
| 21 |
+
Every line of data was run through a large model in order to filter for low quality, repetition, and underage content.
|