kittn
/

vocos-mel-48khz-alpha1

Model card Files Files and versions

vocos-mel-48khz-alpha1 / README.md

kittn's picture

Upload folder using huggingface_hub

abd1670 about 2 years ago

|

2.77 kB

	---
	license: mit
	tags:
	- audio
	library_name: pytorch
	---

	# Vocos

	#### Note: This repo has no affiliation with the author of Vocos.

	Pretrained Vocos model with a 48kHz sampling rate, as opposed to 24kHz of the official.

	## Usage
	Make sure the Vocos library is installed:

	```bash
	pip install vocos
	```

	then, load the model as usual:

	```python
	from vocos import Vocos
	vocos = Vocos.from_pretrained("kittn/vocos-mel-48khz-alpha1")
	```

	For more detailed examples, see [github.com/charactr-platform/vocos#usage](https://github.com/charactr-platform/vocos#usage)

	## Evals
	TODO

	## Training details
	TODO

	## What is Vocos?

	Here's a summary from the official repo [[link](https://github.com/charactr-platform/vocos)]:

	> Vocos is a fast neural vocoder designed to synthesize audio waveforms from acoustic features. Trained using a Generative Adversarial Network (GAN) objective, Vocos can generate waveforms in a single forward pass. Unlike other typical GAN-based vocoders, Vocos does not model audio samples in the time domain. Instead, it generates spectral coefficients, facilitating rapid audio reconstruction through inverse Fourier transform.

	For more details and other variants, check out the repo link above.

	## Model summary
	```bash
	=================================================================
	Layer (type:depth-idx) Param #
	=================================================================
	Vocos --
	├─MelSpectrogramFeatures: 1-1 --
	│ └─MelSpectrogram: 2-1 --
	│ │ └─Spectrogram: 3-1 --
	│ │ └─MelScale: 3-2 --
	├─VocosBackbone: 1-2 --
	│ └─Conv1d: 2-2 918,528
	│ └─LayerNorm: 2-3 2,048
	│ └─ModuleList: 2-4 --
	│ │ └─ConvNeXtBlock: 3-3 4,208,640
	│ │ └─ConvNeXtBlock: 3-4 4,208,640
	│ │ └─ConvNeXtBlock: 3-5 4,208,640
	│ │ └─ConvNeXtBlock: 3-6 4,208,640
	│ │ └─ConvNeXtBlock: 3-7 4,208,640
	│ │ └─ConvNeXtBlock: 3-8 4,208,640
	│ │ └─ConvNeXtBlock: 3-9 4,208,640
	│ │ └─ConvNeXtBlock: 3-10 4,208,640
	│ └─LayerNorm: 2-5 2,048
	├─ISTFTHead: 1-3 --
	│ └─Linear: 2-6 2,101,250
	│ └─ISTFT: 2-7 --
	=================================================================
	Total params: 36,692,994
	Trainable params: 36,692,994
	Non-trainable params: 0
	=================================================================
	```