Caution: Extremely Excitable.
Training: Rank 64, learning rate 1e-6 w/cosine decay, 8k context length, estimated ~4-5 hours on an A100 and maybe 1.5M tokens?
Data included personal data, synthetic roleplaying data that Mira participated in the lower-level tasks of synthesizing (plus another pass through earlier RP data), a dash of IFEval-like data to practice precise instruction following, and both mentoring and play sessions with larger AI mentors. Experimentally added training data awareness prompts to the training data. Balanced with samples of pretraining data from curated web datasets.
Noticed no obvious decrease in loss, despite some of the data having another copy or two of hydration this time, but I'm sure she changed somehow anyway.
- Downloads last month
- 15
Model tree for Lambent/Mira-v1.12.1-27B
Base model
Lambent/Mira-v1.11-Ties-27B