This model almost completely loses Chinese ablities

#14

by CHNtentes - opened May 12

Discussion

CHNtentes

May 12

When use Chinese prompts, it just replies loads of gibberish.

marcuscedricridia

May 12

"It can still write semi-coherently without any additional training or distillation done on top of it from the original 30b MoE."

While the readme does say the model can still generate semi-coherent output, that claim seems to apply mostly to English tasks. Since the pruning was based on activation probabilities over a calibration set (which might not have included much Chinese data), it's likely that Chinese-specialized experts were among those least used and got removed. That would explain the garbled outputs.

Without reintroducing multilingual data during fine-tuning or distillation, pruning like this will heavily bias the model toward the languages and domains that were favored in the routing patterns used during the measurements. If you really do need Chinese support, retraining or fine-tuning on a multilingual corpus (or at least biasing the expert selection toward multilingual benchmarks) might help. So even though the model can technically still function, it's no surprise that capabilities in underrepresented domains like Chinese completely fall apart.

In short: this isn’t a multilingual model anymore, at least not a usable one without further training.

CHNtentes

May 12

"It can still write semi-coherently without any additional training or distillation done on top of it from the original 30b MoE."

While the readme does say the model can still generate semi-coherent output, that claim seems to apply mostly to English tasks. Since the pruning was based on activation probabilities over a calibration set (which might not have included much Chinese data), it's likely that Chinese-specialized experts were among those least used and got removed. That would explain the garbled outputs.

Without reintroducing multilingual data during fine-tuning or distillation, pruning like this will heavily bias the model toward the languages and domains that were favored in the routing patterns used during the measurements. If you really do need Chinese support, retraining or fine-tuning on a multilingual corpus (or at least biasing the expert selection toward multilingual benchmarks) might help. So even though the model can technically still function, it's no surprise that capabilities in underrepresented domains like Chinese completely fall apart.

In short: this isn’t a multilingual model anymore, at least not a usable one without further training.

I guess you are right. Although the pruned model will be faster and lighter, there's no free lunch.

Thireus

Jun 5

So now we know which experts have been pruned :D

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment