This model almost completely loses Chinese ablities

#14
by CHNtentes - opened

When use Chinese prompts, it just replies loads of gibberish.

"It can still write semi-coherently without any additional training or distillation done on top of it from the original 30b MoE."

While the readme does say the model can still generate semi-coherent output, that claim seems to apply mostly to English tasks. Since the pruning was based on activation probabilities over a calibration set (which might not have included much Chinese data), it's likely that Chinese-specialized experts were among those least used and got removed. That would explain the garbled outputs.

Without reintroducing multilingual data during fine-tuning or distillation, pruning like this will heavily bias the model toward the languages and domains that were favored in the routing patterns used during the measurements. If you really do need Chinese support, retraining or fine-tuning on a multilingual corpus (or at least biasing the expert selection toward multilingual benchmarks) might help. So even though the model can technically still function, it's no surprise that capabilities in underrepresented domains like Chinese completely fall apart.

In short: this isn’t a multilingual model anymore, at least not a usable one without further training.

"It can still write semi-coherently without any additional training or distillation done on top of it from the original 30b MoE."

While the readme does say the model can still generate semi-coherent output, that claim seems to apply mostly to English tasks. Since the pruning was based on activation probabilities over a calibration set (which might not have included much Chinese data), it's likely that Chinese-specialized experts were among those least used and got removed. That would explain the garbled outputs.

Without reintroducing multilingual data during fine-tuning or distillation, pruning like this will heavily bias the model toward the languages and domains that were favored in the routing patterns used during the measurements. If you really do need Chinese support, retraining or fine-tuning on a multilingual corpus (or at least biasing the expert selection toward multilingual benchmarks) might help. So even though the model can technically still function, it's no surprise that capabilities in underrepresented domains like Chinese completely fall apart.

In short: this isn’t a multilingual model anymore, at least not a usable one without further training.

I guess you are right. Although the pruned model will be faster and lighter, there's no free lunch.

So now we know which experts have been pruned :D

Sign up or log in to comment