How does ARA were trained exactly?
#2
by
Wakeme
- opened
I'm just curious how exactly ARA was trained? From the README.md, it states that
- ARA is a LoRA that is trained via student teacher training with the student being quantized down to a low precision and the teacher having a high precision
- The training is done on a per layer basis in order to match the parent output as much as possible
So far the ARA itself is a lora of rank 16, and when you said it was trained on per layer basis. What does it exactly mean? In training do do cache the output of each layer from the original weight precision, and apply something like L2 loss on the lora with quantized backbone? Similar to greedy layer-wise pre-training algorithm?
Side note, I think it is funny that ARA is the same project name as the cancelled Google's modular smartphone