Spaces:
Paused
Paused
A newer version of the Gradio SDK is available:
5.49.1
metadata
title: Tiny Audio Demo
emoji: π€
colorFrom: purple
colorTo: blue
sdk: gradio
sdk_version: 4.44.0
python_version: '3.11'
app_file: app.py
pinned: false
license: mit
short_description: Efficient ASR with HuBERT encoder and Qwen3-8B decoder
models:
- mazesmazes/tiny-audio
tags:
- audio
- automatic-speech-recognition
- wav2vec2
- qwen3
- lora
suggested_hardware: cpu-upgrade
preload_from_hub:
- mazesmazes/tiny-audio
Demo Overview
This Space demonstrates an Automatic Speech Recognition (ASR) model that combines:
- HuBERT-large encoder for audio feature extraction
- Qwen3-8B decoder for efficient text generation
Features
- ποΈ Record from microphone or upload audio files
- β‘ Fast inference with only ~2% trainable parameters
- π― English transcription optimized for speech-to-text
- π Lightweight model suitable for edge deployment
Model Architecture
The model uses a novel architecture that bridges audio and text modalities:
- Audio Encoder: Frozen HuBERT-large encoder (317M params)
- Projection Layer: Custom audio-to-text space mapping
- Text Decoder: Qwen3-8B (frozen)
Usage
- Upload an audio file (WAV, MP3, etc.) or record directly using your microphone
- Click "Transcribe" to convert speech to text
- The transcription will appear in the output box
Limitations
- Maximum audio length: 30 seconds
- Optimized for English language
- Best performance with clear speech and minimal background noise
Links
- π¦ Model on Hugging Face
- π» GitHub Repository
- π Technical Details
Citation
If you use this model in your research, please cite:
@software{tiny_audio_2024,
author = {Kroman, Alex},
title = {Tiny Audio: Efficient ASR with HuBERT and Qwen3-8B},
year = {2024},
url = {https://github.com/alexkroman/tiny-audio}
}