Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2006.11477

Papers - Audio - Fine-tuning

WavLLM: Towards Robust and Adaptive Speech Large Language Model

Paper • 2404.00656 • Published Mar 31, 2024 • 11
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization

Paper • 2404.09956 • Published Apr 15, 2024 • 12
Long-form music generation with latent diffusion

Paper • 2404.10301 • Published Apr 16, 2024 • 27
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations

Paper • 2006.11477 • Published Jun 20, 2020 • 8

Papers - Audio - STT - ASR

WhisperX: Time-Accurate Speech Transcription of Long-Form Audio

Paper • 2303.00747 • Published Mar 1, 2023 • 5
Custom Data Augmentation for low resource ASR using Bark and Retrieval-Based Voice Conversion

Paper • 2311.14836 • Published Nov 24, 2023 • 2
SONAR: Sentence-Level Multimodal and Language-Agnostic Representations

Paper • 2308.11466 • Published Aug 22, 2023 • 1
W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training

Paper • 2108.06209 • Published Aug 7, 2021 • 1

Automatic Speech Recognition Architectures

Robust Speech Recognition via Large-Scale Weak Supervision

Paper • 2212.04356 • Published Dec 6, 2022 • 41
Conformer: Convolution-augmented Transformer for Speech Recognition

Paper • 2005.08100 • Published May 16, 2020 • 1
wav2vec: Unsupervised Pre-training for Speech Recognition

Paper • 1904.05862 • Published Apr 11, 2019
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations

Paper • 2006.11477 • Published Jun 20, 2020 • 8

A collection for the first release of Wav2Vec 2.0, a speech encoder that learns powerful representations from unlabelled audio data.

facebook/wav2vec2-large-960h-lv60-self

Automatic Speech Recognition • Updated May 23, 2022 • 47.1k • 155
facebook/wav2vec2-large-960h

Automatic Speech Recognition • Updated Apr 5, 2022 • 25.6k • 32
facebook/wav2vec2-base-960h

Automatic Speech Recognition • 94.4M • Updated Nov 14, 2022 • 2.53M • 383
facebook/wav2vec2-base-100h

Automatic Speech Recognition • Updated May 27, 2022 • 1.22k • 7

Papers - Audio - Speech Transcription

WhisperX: Time-Accurate Speech Transcription of Long-Form Audio

Paper • 2303.00747 • Published Mar 1, 2023 • 5
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations

Paper • 2006.11477 • Published Jun 20, 2020 • 8

UniAudio: An Audio Foundation Model Toward Universal Audio Generation

Paper • 2310.00704 • Published Oct 1, 2023 • 21
Structural Similarities Between Language Models and Neural Response Measurements

Paper • 2306.01930 • Published Jun 2, 2023 • 2
Streaming Transformer ASR with Blockwise Synchronous Beam Search

Paper • 2006.14941 • Published Jun 25, 2020 • 2
NU-GAN: High resolution neural upsampling with GAN

Paper • 2010.11362 • Published Oct 22, 2020 • 2

there's many more on arxiv if you search for CLAP

Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation

Paper • 2211.06687 • Published Nov 12, 2022 • 4
EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for Automated Audio Captioning

Paper • 2401.17690 • Published Jan 31, 2024 • 5
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit

Paper • 2312.09911 • Published Dec 15, 2023 • 55
Audiobox: Unified Audio Generation with Natural Language Prompts

Paper • 2312.15821 • Published Dec 25, 2023 • 17

audio recognition

wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations

Paper • 2006.11477 • Published Jun 20, 2020 • 8

Papers - Audio - Fine-tuning

WavLLM: Towards Robust and Adaptive Speech Large Language Model

Paper • 2404.00656 • Published Mar 31, 2024 • 11
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization

Paper • 2404.09956 • Published Apr 15, 2024 • 12
Long-form music generation with latent diffusion

Paper • 2404.10301 • Published Apr 16, 2024 • 27
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations

Paper • 2006.11477 • Published Jun 20, 2020 • 8

Papers - Audio - Speech Transcription

WhisperX: Time-Accurate Speech Transcription of Long-Form Audio

Paper • 2303.00747 • Published Mar 1, 2023 • 5
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations

Paper • 2006.11477 • Published Jun 20, 2020 • 8

Papers - Audio - STT - ASR

WhisperX: Time-Accurate Speech Transcription of Long-Form Audio

Paper • 2303.00747 • Published Mar 1, 2023 • 5
Custom Data Augmentation for low resource ASR using Bark and Retrieval-Based Voice Conversion

Paper • 2311.14836 • Published Nov 24, 2023 • 2
SONAR: Sentence-Level Multimodal and Language-Agnostic Representations

Paper • 2308.11466 • Published Aug 22, 2023 • 1
W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training

Paper • 2108.06209 • Published Aug 7, 2021 • 1

UniAudio: An Audio Foundation Model Toward Universal Audio Generation

Paper • 2310.00704 • Published Oct 1, 2023 • 21
Structural Similarities Between Language Models and Neural Response Measurements

Paper • 2306.01930 • Published Jun 2, 2023 • 2
Streaming Transformer ASR with Blockwise Synchronous Beam Search

Paper • 2006.14941 • Published Jun 25, 2020 • 2
NU-GAN: High resolution neural upsampling with GAN

Paper • 2010.11362 • Published Oct 22, 2020 • 2

Automatic Speech Recognition Architectures

Robust Speech Recognition via Large-Scale Weak Supervision

Paper • 2212.04356 • Published Dec 6, 2022 • 41
Conformer: Convolution-augmented Transformer for Speech Recognition

Paper • 2005.08100 • Published May 16, 2020 • 1
wav2vec: Unsupervised Pre-training for Speech Recognition

Paper • 1904.05862 • Published Apr 11, 2019
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations

Paper • 2006.11477 • Published Jun 20, 2020 • 8

there's many more on arxiv if you search for CLAP

Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation

Paper • 2211.06687 • Published Nov 12, 2022 • 4
EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for Automated Audio Captioning

Paper • 2401.17690 • Published Jan 31, 2024 • 5
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit

Paper • 2312.09911 • Published Dec 15, 2023 • 55
Audiobox: Unified Audio Generation with Natural Language Prompts

Paper • 2312.15821 • Published Dec 25, 2023 • 17

A collection for the first release of Wav2Vec 2.0, a speech encoder that learns powerful representations from unlabelled audio data.

facebook/wav2vec2-large-960h-lv60-self

Automatic Speech Recognition • Updated May 23, 2022 • 47.1k • 155
facebook/wav2vec2-large-960h

Automatic Speech Recognition • Updated Apr 5, 2022 • 25.6k • 32
facebook/wav2vec2-base-960h

Automatic Speech Recognition • 94.4M • Updated Nov 14, 2022 • 2.53M • 383
facebook/wav2vec2-base-100h

Automatic Speech Recognition • Updated May 27, 2022 • 1.22k • 7

audio recognition

wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations

Paper • 2006.11477 • Published Jun 20, 2020 • 8

Previous
1
2
3
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs