🌸 PitchFlower

Official pretrained checkpoint of the paper PitchFlower: A flow-based neural audio codec with pitch controllability.

🧠 Overview

PitchFlower achieves pitch controllability by means of a perturbation strategy. During inference, pitch information is removed by applying a random flattening and shifting operation. The model is trained with a reconstruction task, providing pitch information explicitly.

PitchFlower architecture

We use an autoencoder with an RVQ bottleneck and a flow-based decoder to produce high-quality audio. More details can be found in the paper.

📦 Installation and Usage

Check out our GitHub repo to learn how to use PitchFlower https://github.com/diegotg2000/PitchFlower

🙌 Acknowledgements

We'd like to acknowledge the repositories from which we draw inspiration and parts of the code

Vocos: https://github.com/gemelo-ai/vocos
WavTokenizer: https://github.com/jishengpeng/WavTokenizer
Encodec: https://github.com/facebookresearch/encodec

This work has been done in the Analysis/Synthesis team of the STMS laboratory at IRCAM. It has been funded by the ANR project EVA.

📫 Contact

For questions or collaboration opportunities, feel free to reach out: [email protected]

🧩 Citation

@misc{pitchflower,
      title={PitchFlower: A flow-based neural audio codec with pitch controllability}, 
      author={Diego Torres and Axel Roebel and Nicolas Obin},
      year={2025},
      eprint={2510.25566},
      archivePrefix={arXiv},
      url={https://arxiv.org/abs/2510.25566}, 
}

📜 License

This project is licensed under the CC BY-NC-SA 4.0 license.

Downloads last month: 99

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

diegotg343
/

PitchFlower