-
WavLLM: Towards Robust and Adaptive Speech Large Language Model
Paper • 2404.00656 • Published • 11 -
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization
Paper • 2404.09956 • Published • 12 -
Long-form music generation with latent diffusion
Paper • 2404.10301 • Published • 27 -
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
Paper • 2006.11477 • Published • 8
Collections
Discover the best community collections!
Collections including paper arxiv:2006.11477
-
WhisperX: Time-Accurate Speech Transcription of Long-Form Audio
Paper • 2303.00747 • Published • 5 -
Custom Data Augmentation for low resource ASR using Bark and Retrieval-Based Voice Conversion
Paper • 2311.14836 • Published • 2 -
SONAR: Sentence-Level Multimodal and Language-Agnostic Representations
Paper • 2308.11466 • Published • 1 -
W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training
Paper • 2108.06209 • Published • 1
-
Robust Speech Recognition via Large-Scale Weak Supervision
Paper • 2212.04356 • Published • 41 -
Conformer: Convolution-augmented Transformer for Speech Recognition
Paper • 2005.08100 • Published • 1 -
wav2vec: Unsupervised Pre-training for Speech Recognition
Paper • 1904.05862 • Published -
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
Paper • 2006.11477 • Published • 8
-
facebook/wav2vec2-large-960h-lv60-self
Automatic Speech Recognition • Updated • 47.1k • 155 -
facebook/wav2vec2-large-960h
Automatic Speech Recognition • Updated • 25.6k • 32 -
facebook/wav2vec2-base-960h
Automatic Speech Recognition • 94.4M • Updated • 2.53M • 383 -
facebook/wav2vec2-base-100h
Automatic Speech Recognition • Updated • 1.22k • 7
-
UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Paper • 2310.00704 • Published • 21 -
Structural Similarities Between Language Models and Neural Response Measurements
Paper • 2306.01930 • Published • 2 -
Streaming Transformer ASR with Blockwise Synchronous Beam Search
Paper • 2006.14941 • Published • 2 -
NU-GAN: High resolution neural upsampling with GAN
Paper • 2010.11362 • Published • 2
-
Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation
Paper • 2211.06687 • Published • 4 -
EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for Automated Audio Captioning
Paper • 2401.17690 • Published • 5 -
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
Paper • 2312.09911 • Published • 55 -
Audiobox: Unified Audio Generation with Natural Language Prompts
Paper • 2312.15821 • Published • 17
-
WavLLM: Towards Robust and Adaptive Speech Large Language Model
Paper • 2404.00656 • Published • 11 -
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization
Paper • 2404.09956 • Published • 12 -
Long-form music generation with latent diffusion
Paper • 2404.10301 • Published • 27 -
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
Paper • 2006.11477 • Published • 8
-
WhisperX: Time-Accurate Speech Transcription of Long-Form Audio
Paper • 2303.00747 • Published • 5 -
Custom Data Augmentation for low resource ASR using Bark and Retrieval-Based Voice Conversion
Paper • 2311.14836 • Published • 2 -
SONAR: Sentence-Level Multimodal and Language-Agnostic Representations
Paper • 2308.11466 • Published • 1 -
W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training
Paper • 2108.06209 • Published • 1
-
UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Paper • 2310.00704 • Published • 21 -
Structural Similarities Between Language Models and Neural Response Measurements
Paper • 2306.01930 • Published • 2 -
Streaming Transformer ASR with Blockwise Synchronous Beam Search
Paper • 2006.14941 • Published • 2 -
NU-GAN: High resolution neural upsampling with GAN
Paper • 2010.11362 • Published • 2
-
Robust Speech Recognition via Large-Scale Weak Supervision
Paper • 2212.04356 • Published • 41 -
Conformer: Convolution-augmented Transformer for Speech Recognition
Paper • 2005.08100 • Published • 1 -
wav2vec: Unsupervised Pre-training for Speech Recognition
Paper • 1904.05862 • Published -
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
Paper • 2006.11477 • Published • 8
-
Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation
Paper • 2211.06687 • Published • 4 -
EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for Automated Audio Captioning
Paper • 2401.17690 • Published • 5 -
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
Paper • 2312.09911 • Published • 55 -
Audiobox: Unified Audio Generation with Natural Language Prompts
Paper • 2312.15821 • Published • 17
-
facebook/wav2vec2-large-960h-lv60-self
Automatic Speech Recognition • Updated • 47.1k • 155 -
facebook/wav2vec2-large-960h
Automatic Speech Recognition • Updated • 25.6k • 32 -
facebook/wav2vec2-base-960h
Automatic Speech Recognition • 94.4M • Updated • 2.53M • 383 -
facebook/wav2vec2-base-100h
Automatic Speech Recognition • Updated • 1.22k • 7