Running on Zero 4 SoulX Podcast 1.7B Dialect 📊 4 Realistic Long-form Podcasts Generation with Dialectal
Running on Zero 4 SoulX Podcast 1.7B Dialect 📊 4 Realistic Long-form Podcasts Generation with Dialectal
OSUM-EChat: Enhancing End-to-End Empathetic Spoken Chatbot via Understanding-Driven Spoken Dialogue Paper • 2508.09600 • Published Aug 13
SenSE: Semantic-Aware High-Fidelity Universal Speech Enhancement Paper • 2509.24708 • Published Sep 29
SoulX-Podcast: Towards Realistic Long-form Podcasts with Dialectal and Paralinguistic Diversity Paper • 2510.23541 • Published Oct 27 • 13
SAC: Neural Speech Codec with Semantic-Acoustic Dual-Stream Quantization Paper • 2510.16841 • Published Oct 19
Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens Paper • 2503.01710 • Published Mar 3 • 6
SLAM-Omni: Timbre-Controllable Voice Interaction System with Single-Stage Training Paper • 2412.15649 • Published Dec 20, 2024 • 1
SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMs Paper • 2410.09503 • Published Oct 12, 2024
EAT: Self-Supervised Pre-Training with Efficient Audio Transformer Paper • 2401.03497 • Published Jan 7, 2024 • 1