πΈ PitchFlower
Official pretrained checkpoint of the paper PitchFlower: A flow-based neural audio codec with pitch controllability.
π§ Overview
PitchFlower achieves pitch controllability by means of a perturbation strategy. During inference, pitch information is removed by applying a random flattening and shifting operation. The model is trained with a reconstruction task, providing pitch information explicitly.
We use an autoencoder with an RVQ bottleneck and a flow-based decoder to produce high-quality audio. More details can be found in the paper.
π¦ Installation and Usage
Check out our GitHub repo to learn how to use PitchFlower https://github.com/diegotg2000/PitchFlower
π Acknowledgements
We'd like to acknowledge the repositories from which we draw inspiration and parts of the code
- Vocos: https://github.com/gemelo-ai/vocos
- WavTokenizer: https://github.com/jishengpeng/WavTokenizer
- Encodec: https://github.com/facebookresearch/encodec
This work has been done in the Analysis/Synthesis team of the STMS laboratory at IRCAM. It has been funded by the ANR project EVA.
π« Contact
For questions or collaboration opportunities, feel free to reach out: [email protected]
π§© Citation
@misc{pitchflower,
title={PitchFlower: A flow-based neural audio codec with pitch controllability},
author={Diego Torres and Axel Roebel and Nicolas Obin},
year={2025},
eprint={2510.25566},
archivePrefix={arXiv},
url={https://arxiv.org/abs/2510.25566},
}
π License
This project is licensed under the CC BY-NC-SA 4.0 license.
- Downloads last month
- 99