Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,49 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: cc-by-nc-sa-4.0
|
| 3 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: cc-by-nc-sa-4.0
|
| 3 |
+
---
|
| 4 |
+
|
| 5 |
+
### Description
|
| 6 |
+
|
| 7 |
+
This model is used to separate reverb and delay effects in vocals. In addition, it can also separate partial harmony, but it cannot completely separate them. I added random high cut after the reverberation and delay effects in the dataset, so the model's handling of high frequencies is not particularly aggressive.<br>
|
| 8 |
+
You can try listening to the performance of this model [here](./examples)!
|
| 9 |
+
|
| 10 |
+
### How to use the model?
|
| 11 |
+
|
| 12 |
+
Try it with [ZFTurbo's Music-Source-Separation-Training](https://github.com/ZFTurbo/Music-Source-Separation-Training)
|
| 13 |
+
|
| 14 |
+
### Model
|
| 15 |
+
|
| 16 |
+
Configs: [config_dereverb-echo_mel_band_roformer.yaml](./config_dereverb-echo_mel_band_roformer.yaml)<br>
|
| 17 |
+
Model: [dereverb-echo_mel_band_roformer_sdr_10.0169.ckpt](./dereverb-echo_mel_band_roformer_sdr_10.0169.ckpt)<br>
|
| 18 |
+
Instruments: [dry, other]<br>
|
| 19 |
+
Finetuned from: `model_mel_band_roformer_ep_3005_sdr_11.4360.ckpt`<br>
|
| 20 |
+
Datasets:
|
| 21 |
+
- Training datasets: 270 songs from [opencpop](https://github.com/wenet-e2e/opencpop) and [GTSinger](https://github.com/GTSinger/GTSinger)
|
| 22 |
+
- Validation datasets: 30 songs from my own collection
|
| 23 |
+
- All random reverbs and delay effects are generated by [this python script](./scripts/create_reverb_delay.py) and sorted into the mustb18 dataset format.
|
| 24 |
+
Metrics: Based on the sdr value of 30 songs for validation.
|
| 25 |
+
|
| 26 |
+
```
|
| 27 |
+
Instr dry sdr: 13.1507 (Std: 4.1088)
|
| 28 |
+
Instr dry l1_freq: 53.7715 (Std: 13.3363)
|
| 29 |
+
Instr dry si_sdr: 12.7707 (Std: 4.6134)
|
| 30 |
+
Instr other sdr: 6.8830 (Std: 2.5547)
|
| 31 |
+
Instr other l1_freq: 52.7358 (Std: 11.8587)
|
| 32 |
+
Instr other si_sdr: 5.9448 (Std: 2.8721)
|
| 33 |
+
Metric avg sdr : 10.0169
|
| 34 |
+
Metric avg l1_freq : 53.2536
|
| 35 |
+
Metric avg si_sdr : 9.3577
|
| 36 |
+
```
|
| 37 |
+
|
| 38 |
+
### Training log
|
| 39 |
+
|
| 40 |
+
Training logs: [train.log](./train.log)<br>
|
| 41 |
+
The following image is the TensorBoard visualization training log generated by [this script](./scripts/start_tensorboard.py).
|
| 42 |
+

|
| 43 |
+
|
| 44 |
+
### Thanks
|
| 45 |
+
|
| 46 |
+
- Mel-Band-Roformer [[Paper](https://arxiv.org/abs/2310.01809), [Repository](https://github.com/lucidrains/BS-RoFormer)]
|
| 47 |
+
- [ZFTurbo](https://github.com/ZFTurbo)'s training code [[Music-Source-Separation-Training](https://github.com/ZFTurbo/Music-Source-Separation-Training)]
|
| 48 |
+
- [CN17161](https://github.com/CN17161) provided GPUs.
|
| 49 |
+
- [Glucy-2](https://github.com/Glucy-2) provided technical assistance.
|