How does ARA were trained exactly?

by Wakeme - opened Aug 22

Discussion

Wakeme

Aug 22

•

edited Aug 23

I'm just curious how exactly ARA was trained? From the README.md, it states that

ARA is a LoRA that is trained via student teacher training with the student being quantized down to a low precision and the teacher having a high precision
The training is done on a per layer basis in order to match the parent output as much as possible

So far the ARA itself is a lora of rank 16, and when you said it was trained on per layer basis. What does it exactly mean? In training do do cache the output of each layer from the original weight precision, and apply something like L2 loss on the lora with quantized backbone? Similar to greedy layer-wise pre-training algorithm?

Side note, I think it is funny that ARA is the same project name as the cancelled Google's modular smartphone

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment