Spaces:

mazesmazes
/

tiny-audio

Paused

App Files Files Community

tiny-audio / README.md

HF Space Deploy

Deploy demo to HF Space

e815632 12 days ago

preview code

raw

history blame contribute delete

2.05 kB

A newer version of the Gradio SDK is available: 5.49.1

Upgrade

metadata

title: Tiny Audio Demo
emoji: 🎤
colorFrom: purple
colorTo: blue
sdk: gradio
sdk_version: 4.44.0
python_version: '3.11'
app_file: app.py
pinned: false
license: mit
short_description: Efficient ASR with HuBERT encoder and Qwen3-8B decoder
models:
  - mazesmazes/tiny-audio
tags:
  - audio
  - automatic-speech-recognition
  - wav2vec2
  - qwen3
  - lora
suggested_hardware: cpu-upgrade
preload_from_hub:
  - mazesmazes/tiny-audio

Demo Overview

This Space demonstrates an Automatic Speech Recognition (ASR) model that combines:

HuBERT-large encoder for audio feature extraction
Qwen3-8B decoder for efficient text generation

Features

🎙️ Record from microphone or upload audio files
⚡ Fast inference with only ~2% trainable parameters
🎯 English transcription optimized for speech-to-text
📊 Lightweight model suitable for edge deployment

Model Architecture

The model uses a novel architecture that bridges audio and text modalities:

Audio Encoder: Frozen HuBERT-large encoder (317M params)
Projection Layer: Custom audio-to-text space mapping
Text Decoder: Qwen3-8B (frozen)

Usage

Upload an audio file (WAV, MP3, etc.) or record directly using your microphone
Click "Transcribe" to convert speech to text
The transcription will appear in the output box

Limitations

Maximum audio length: 30 seconds
Optimized for English language
Best performance with clear speech and minimal background noise

Citation

If you use this model in your research, please cite:

@software{tiny_audio_2024,
  author = {Kroman, Alex},
  title = {Tiny Audio: Efficient ASR with HuBERT and Qwen3-8B},
  year = {2024},
  url = {https://github.com/alexkroman/tiny-audio}
}