Audio Spaces
-
π71
-
Seamless M4T
π951 -
MusicGen
π΅5.07kGenerate music from text descriptions and optional melodies
-
Audioldm Text To Audio Generation
π811Generate audio from text descriptions
-
AudioLDM2 Text2Audio Text2Music Generation
π308Generate audio and waveform video from text
-
AudioSep
π222 -
Lp Music Caps
π΅170Generate captions for music audio
-
Tortoise Tts
π’312ExpressivText-to-Speech
-
All In One
π22 -
XTTS
πΈ2.77kGenerate speech from text using a reference voice
-
Coqui Bark Voice Cloning
πΈ189 -
VALL E X
π365Generate audio from text using voice prompts
-
WavJourney
π₯193 -
Music To Image
πΆ264 -
MMS
π277Transform and identify speech with MMS
-
ElevenLabs TTS
π£613Generate voice from text using ElevenLabs
-
AudioGPT
π289 -
Bark
πΆ2.37kGenerate realistic audio from text
-
SpeechT5 Speech Recognition Demo
π©36 -
CoquiTTS (Official)
πΈ173 -
Whisper
π2.63kTranscribe audio files or YouTube videos into text
-
Moe TTS
π658Generate and convert voice using text and audio inputs
-
YourTTS
π₯17 -
Talking Face Generation with Multilingual TTS
π557Generate a talking face video from text in multiple languages
-
OpenAI TTS New
π562 -
Mustango
π’167 -
OWSM Demo
π55 -
StyleTTS 2
π£709Efficient, fast, and natural text to speech with StyleTTS 2!
-
HierSpeech++ (Zero-shot TTS)
β‘399Generate high-quality speech from text using a prompt audio
-
Video2music
π21Generate music for a video based on its content and key
-
Whisper Large V2
π€«187 -
Musicgen Prompt Upsampling
π64Generate music from text prompts πΆ
-
Seamless M4T v2
π516Translate speech and text between languages
-
Seamless Streaming
π319Translate text between languages
-
Matcha TTS
π΅52Generate speech from text with speaker selection
-
MusicGen Streaming
π₯275Generate music from text prompts
-
Resemble Enhance
π424Enhance and denoise your audio files
-
Singing Voice Conversion
πΌ261Transform your voice into a singer's
-
NaturalSpeech2
π§52Generate speech with cloned timbre
-
Create Your Own TTS Dataset
π₯21 -
Podcast Transcription
π’ -
OpenVoice
π€1.11kGenerate voice from text using a reference audio
-
M2UGen Demo
π»94 -
Pheme
π68 -
ESPnet2 TTS
π6Convert text to speech in English, Chinese, or Japanese
-
Whisper-WebUI
π37Generate subtitles and translate audio files
-
Image2SFX Comparison
π174Generates audio environment from an image
-
WhisperSpeech
π¬379 -
MetaVoice 1B
π£144A demo of MetaVoice 1B, a new TTS model by MetaVoice.
-
TTS Arena V2
π902Vote on the latest TTS models!
-
Whisper Speech X DreamTalk
π½173Combine voice cloning and portrait lipsync animation
-
Canary 1b
π€197Transcribe and translate audio into text
-
SALMONN Audio Questioning
β‘82Deeply interrogate audio file content
-
MeloTTS
π£467Fast, efficient, & multilingual text-to-speech
-
Audio Editing
π§312Edit audios with text prompts
-
ChatMusician
π»18 -
xVASynth TTS
π§73CPU powered, low RTF, emotional, multilingual TTS
-
NaturalSpeech3 FACodec
π180Convert and reconstruct speech files
-
Hey Gemma
β25 -
Ratchet + Whisper
π£70Convert audio to text
-
AutoSubs
π3Automatically add on-screen subs to your videos
-
VoiceCraft
π161 -
TangoFlux
π322Text to Audio (Sound SFX) Generator
-
Parler-TTS
π₯831High-fidelity Text-To-Speech
-
Sing an idea β‘οΈ Music
π₯184Bring song ideas to life
-
Musicgen Songstarter Demo
π75Generate music using descriptions and optional melody audio
-
Whisper JAX
π145Transcribe or translate audio from microphone, file, or YouTube
-
AudioLCM
π’22Generate audio from text
-
Stable Audio Live Multiplayer
π»160Generate audio from text prompts
-
Stable Audio Open Zero
π₯449Generate audio from text prompts
-
Make An Audio 3
π14Generate audio from text prompts
-
Mars5 Space
π60 -
Tango Music AF
π΅5Text to Music Generator
-
Jam
π16Generate a song from lyrics and style reference
-
BigVGAN
π108Generate high-quality audio from input audio
-
SenseVoice
π89Transcribe audio with emotions and events
-
PicoAudio
π28Generate audio from text descriptions with timestamps
-
Audio Flamingo Demo
π7 -
MusiConGen
πͺ©29 -
Mms Zeroshot
π20Transcribe audio in any language using text data
-
GPT SoVITS V2 Pro Plus
π€203Generate speech from text using reference audio
-
EzAudio
π£275Generate and edit audio from text prompts
-
OpenMusic
πΆ214Generate music from text descriptions
-
Midi Music Generator
πΌ549Generate MIDI music from prompts
-
Whisper Turbo
π€―990Transcribe audio or YouTube videos into text
-
Realtime Whisper Turbo
π€―338Realtime implementation of Whisper large turbo
-
Whisper Large V3 Turbo WebGPU
π166ML-powered speech recognition directly in your browser
-
OpenAudio S1
π662Generate speech from text
-
TTS Spaces Arena
π€448Blind vote on HF TTS models!
-
Diva Realtime Chat
π£19Generate text responses from audio input
-
F5-TTS
π£2.69kF5-TTS & E2-TTS: Zero-Shot Voice Cloning (Unofficial Demo)
-
MaskGCT TTS Demo
π»260MaskGCT TTS Demo
-
MelodyFlow
π΅137Generate music from text descriptions
-
Fish Agent
π¬147An end-to-end (e2e) Voice Language Model by Fish Audio.
-
Nexa Omni Demo
π§64Generate text from audio input
-
Kokoro TTS
β€3.05kUpgraded to v1.0!
-
Make Custom Voices With KokoroTTS
β‘123Make Custom Voices With KokoroTTS
-
Llasa 3b Tts
π₯311Zero Shot voice cloning with llasa 3b (Unofficial Demo)
-
Llasa 1b Multilingual TTS
π12Generate speech from text with or without cloning a voice
-
Kokoro Text-to-Speech (WebGPU)
π£347High-quality speech synthesis powered by Kokoro TTS
-
Hibiki Simple
π42High-Fidelity Simultaneous Speech-To-Speech Translation
-
Zonos
π410Generate audio from text with customizable emotions and settings
-
Kokoro Web
π£77ML-powered speech synthesis directly in your browser
-
DiβͺβͺRhythm
πΆ656Blazingly Fast and Embarrassingly Simple Song Generation
-
Audiobox Aesthetics
π22Demo for audiobox-aesthetics
-
Spark TTS
π229A text-to-speech model powered by SparkAudio and Mobvoi.
-
Sesame CSM
π±851Conversational speech generation
-
Orpheus TTS
π238Try Orpheus TTS here
-
Canary 1B Flash
π€43Canary 1B Flash demo
-
IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
π216Generate speech from text using a reference audio
-
AudioMorphix
π6Prepare environment and run Gradio app
-
MegaTTS3 Demo
π93 -
AudioX
π156Generate audio from text and video prompts
-
Vevo for Zero-shot VC, TTS, and More
π100Controllable Zero-Shot Voice Imitation
-
Dia 1.6B
π―1.71kGenerate realistic dialogue from a script, using Dia!
-
Aero 1 Audio Demo
π¬43Demo for Aero-1-Audio
-
Voila Demo
π»43Chat with a voice-clone AI
-
ACE Step
π»598A Step Towards Music Generation Foundation Model
-
Audio Difficulty Estimator
πΉ2Estimate piano difficulty from audio
-
TIGER Audio Extractor
β108Extraction & Reconstruction for Efficient Speech Separation
-
Music2emo
π15Towards Unified Music Emotion Recognition across Dimensional
-
SonicVerse
πΌ13Generate detailed music descriptions from audio clips
-
Auffusion
π»40Audio Gen, Audio Style Transfer and Audio InPainting
-
Chatterbox TTS
πΏ1.62kExpressive Zeroshot TTS
-
PlayDiffusion
π¨118Generate modified audio from text and voice
-
Voice Clone Arena
π2Vote on the latest Voice Clone TTS models!
-
Conversational WebGPU
π226 -
Song Generation
π΅517Generate a custom song from lyrics and optional prompts
-
NotaGen
π56Generate classical sheet music in ABC notation
-
Audio Flamingo 3 Demo
π85Audio Flamingo 3 Demo
-
Audio Flamingo 3 Chat
π32Audio Flamingo 3 demo for multi-turn multi-audio chat
-
MSR UTMOS
π’6Multiple sampling rate MOS prediction with SFI conv
-
Higgs Audio Demo
π€391Higgs Audio Demo
-
sidon_demo_beta
π18Speech restoration demo of Sidon.
-
Canary 1b V2
π€66Transcribe and Translate in 25 European Languages
-
SonicMaster β Text-Guided Music Restoration & Mastering
π§18Enhance audio using text prompts
-
OLMoASR
π6Open Models and Data for Training Robust Speech Recognition
-
VibeVoice-Large
π85Generate a podcast audio from a script and voice samples
-
TaDiCodec TTS AR Qwen2.5 0.5B
π10Generate speech from text with voice cloning
-
EchoX
π₯8An end-to-end speech large language model.
-
VoxCPM 0.5B
π’43Generate expressive speech from text with optional voice cloning
-
FireRedTTS2
π₯35Long-form multi-speaker dialogue generation
-
FireRedASR
π4FireRedASR Demo
-
IndexTTS 2 Demo
π’552Generate expressive speech from text with emotion control
-
SongFormer
π΅13State-of-the-art music analysis with multi-scale datasets
-
Voice Acting TTS
π16TTS for any emotion, now with non-verbal sounds!
-
Omnilingual ASR Media Transcription
π187Transcribe audio or video into text in multiple languages
-
Music Flamingo
π΅57Upload audio or YouTube link to get detailed analysis
-
Maya1
π106Demo of our new open source model maya1
-
Supertonic (TTS)
β‘156Lightning-Fast, On-Device TTS
-
Dia2 2B
π¨50Streaming conversational audio in realtime