🌸 PitchFlower

arXiv GitHub

Official pretrained checkpoint of the paper PitchFlower: A flow-based neural audio codec with pitch controllability.

🧠 Overview

PitchFlower achieves pitch controllability by means of a perturbation strategy. During inference, pitch information is removed by applying a random flattening and shifting operation. The model is trained with a reconstruction task, providing pitch information explicitly.

PitchFlower architecture

We use an autoencoder with an RVQ bottleneck and a flow-based decoder to produce high-quality audio. More details can be found in the paper.

πŸ“¦ Installation and Usage

Check out our GitHub repo to learn how to use PitchFlower https://github.com/diegotg2000/PitchFlower

πŸ™Œ Acknowledgements

We'd like to acknowledge the repositories from which we draw inspiration and parts of the code

This work has been done in the Analysis/Synthesis team of the STMS laboratory at IRCAM. It has been funded by the ANR project EVA.

πŸ“« Contact

For questions or collaboration opportunities, feel free to reach out: [email protected]

🧩 Citation

@misc{pitchflower,
      title={PitchFlower: A flow-based neural audio codec with pitch controllability}, 
      author={Diego Torres and Axel Roebel and Nicolas Obin},
      year={2025},
      eprint={2510.25566},
      archivePrefix={arXiv},
      url={https://arxiv.org/abs/2510.25566}, 
}

πŸ“œ License

This project is licensed under the CC BY-NC-SA 4.0 license.

Downloads last month
99
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Space using diegotg343/PitchFlower 1