Unsloth Whisper Large V3 Turbo - Pruna 8bit Optimized

This model is a Pruna-optimized version of openai/whisper-large-v3-turbo with 8-bit quantization optimizations.

Optimizations Applied

Optimizations Applied

  • Batcher Optimization: int8 enabled (whisper_s2t_int8: True)
  • Compiler: c_whisper
  • Batcher: whisper_s2t

Usage

Option 1: Standard Transformers (Recommended for most users)

from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor

# Simple loading - no Pruna installation required
model = AutoModelForSpeechSeq2Seq.from_pretrained("manohar03/unsloth-whisper-large-v3-turbo-pruna-8bit")
processor = AutoProcessor.from_pretrained("manohar03/unsloth-whisper-large-v3-turbo-pruna-8bit")

# Use normally
result = model.generate(inputs, ...)

Option 2: With Pruna Optimization (Maximum Performance)

from pruna import smash, SmashConfig
from transformers import AutoModelForSpeechSeq2Seq, AutoTokenizer, AutoProcessor
import json

# Load model and tokenizer
model = AutoModelForSpeechSeq2Seq.from_pretrained("manohar03/unsloth-whisper-large-v3-turbo-pruna-8bit")
tokenizer = AutoTokenizer.from_pretrained("manohar03/unsloth-whisper-large-v3-turbo-pruna-8bit")
processor = AutoProcessor.from_pretrained("manohar03/unsloth-whisper-large-v3-turbo-pruna-8bit")

# Load SmashConfig
with open("smash_config.json", "r") as f:
    config_dict = json.load(f)

# Recreate SmashConfig
smash_config = SmashConfig()
for key, value in config_dict.items():
    smash_config[key] = value

# Apply Pruna optimizations
smashed_model = smash(
    model=model,
    smash_config=smash_config
)

# Use the optimized model
result = smashed_model.inference(audio_input)

Performance Benefits

  • Reduced memory usage from 8-bit weight quantization
  • Optimized inference pipeline with int8 batcher
  • Maintained audio transcription quality

Base Model

This model is based on unsloth/whisper-large-v3-turbo, which itself is optimized from openai/whisper-large-v3-turbo. It retains all the capabilities of both base models while providing additional Pruna performance improvements.

Downloads last month
4
Safetensors
Model size
0.8B params
Tensor type
F16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for manohar03/unsloth-whisper-large-v3-turbo-pruna-8bit

Finetuned
(54)
this model