Continue-TTS

Continue-TTS

Text-to-Speech Model Based on Continue-1-OSS

Introduction

We are thrilled to introduce Continue-TTS, a fine-tuned text-to-speech model based on the Continue-1-OSS architecture, developed by SVECTOR. This model is specifically trained for high-quality speech synthesis and delivers exceptional voice generation capabilities.

Continue-TTS is engineered to provide:

  • Natural Speech: Human-like intonation, emotion, and rhythm that rivals commercial solutions
  • 8 Unique Voices: Diverse voice options with distinct personalities and characteristics
  • Real-time Generation: Low-latency streaming for interactive applications (~200ms)
  • Emotional Expression: Built-in support for laughter, sighs, gasps, and other natural emotions
  • Open Source: Fully accessible under Apache 2.0 license for research and commercial use

This model is based on the Continue-1-OSS architecture and combines the power of large language models with neural audio codecs to generate exceptionally natural speech from text.

The sun was setting behind the mountains, painting the sky with soft shades of orange and violet.
She stood there quietly, breathing in the moment. <sigh>
Sometimes, the smallest moments are the ones that change everything.

<sigh>  
Not every journey is loud.  
Some begin quietly… inside.  
But once they begin, they never stop.  
We continue.

Model Specifications

  • Base Architecture: Continue-1-OSS
  • Type: Text-to-Speech (TTS) Model
  • Parameters: 3 Billion
  • Audio Codec: SNAC (24kHz)
  • Context Length: 131,072 tokens
  • Vocabulary: 156,940 tokens (including 28,672 audio tokens)
  • License: Apache 2.0
  • Voices: 8 (Nova, Aurora, Stellar, Atlas, Orion, Luna, Phoenix, Ember)

Requirements

To use Continue-TTS, install the required dependencies:

pip install transformers torch
pip install snac  # Audio codec
pip install vllm==0.7.3  # For fast inference (optional but recommended)

Quickstart

Basic Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "SVECTOR-CORPORATION/Continue-TTS"

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

# Prepare text with voice
text = "Hello! I am Continue-TTS, a text-to-speech model based on Continue-1-OSS."
voice = "nova"  # Choose: nova, aurora, stellar, atlas, orion, luna, phoenix, ember

# Format prompt (TTS format)
adapted_prompt = f"{voice}: {text}"
prompt_tokens = tokenizer(adapted_prompt, return_tensors="pt")
start_token = torch.tensor([[128259]], dtype=torch.int64)
end_tokens = torch.tensor([[128009, 128260, 128261, 128257]], dtype=torch.int64)
input_ids = torch.cat([start_token, prompt_tokens.input_ids, end_tokens], dim=1)

# Generate audio tokens
outputs = model.generate(
    input_ids.to(model.device),
    max_new_tokens=1200,
    temperature=0.6,
    top_p=0.8,
    repetition_penalty=1.3,
    eos_token_id=49158,  # TTS stop token
    do_sample=True
)

# Decode tokens (audio codes can be decoded using SNAC decoder)
generated_tokens = tokenizer.decode(outputs[0], skip_special_tokens=False)

Using Continue-TTS Package (Recommended)

For easier usage with audio generation, use the Continue-TTS package:

pip install continue-speech
from continue_tts import Continue1Model
import wave

# Initialize model
model = Continue1Model(model_name="SVECTOR-CORPORATION/Continue-TTS", max_model_len=2048)

# Generate speech
text = "Welcome to Continue-TTS! This model is built on Continue-1-OSS."
audio_chunks = model.generate_speech(prompt=text, voice="nova")

# Save to file
with wave.open("output.wav", "wb") as wf:
    wf.setnchannels(1)
    wf.setsampwidth(2)
    wf.setframerate(24000)
    for chunk in audio_chunks:
        wf.writeframes(chunk)

Available Voices

Continue-TTS includes 8 professionally designed voices:

Voice Gender Description
nova Female Conversational and natural, perfect for general use
aurora Female Warm and friendly, excellent for storytelling
stellar Female Energetic and bright, great for upbeat content
atlas Male Deep and authoritative, ideal for narration
orion Male Friendly and casual, perfect for conversational content
luna Female Soft and gentle, excellent for calm narration
phoenix Male Dynamic and expressive, great for engaging content
ember Female Warm and engaging, perfect for emotional expression

Advanced Features

Emotion Tags

Add natural emotions to your speech:

text = "This is incredible! <laugh> I can't believe how natural it sounds. <gasp>"

Supported emotions:

  • <laugh> - Natural laughter
  • <chuckle> - Light laugh
  • <sigh> - Expressive sigh
  • <gasp> - Surprised gasp
  • <cough> - Cough sound
  • <yawn> - Yawn
  • <groan> - Groan
  • <sniffle> - Sniffle

Custom Generation Parameters

Fine-tune generation quality:

audio = model.generate_speech(
    prompt="Your text here",
    voice="nova",
    temperature=0.6,        # Lower = more consistent, Higher = more varied
    top_p=0.8,             # Nucleus sampling threshold
    max_tokens=1200,       # Maximum audio length
    repetition_penalty=1.3 # Prevent token repetition
)

Use Cases

Continue-TTS excels at:

  • Audiobook Narration: Natural storytelling with emotional expression
  • Virtual Assistants: Conversational AI with personality
  • Accessibility: Text-to-speech for visually impaired users
  • Content Creation: Voiceovers for videos, podcasts, and presentations
  • Gaming: Dynamic character voices and dialogue
  • Education: Interactive learning materials with voice
  • Customer Service: Natural-sounding automated responses

Performance

  • Quality: State-of-the-art natural speech synthesis
  • Latency: ~200ms for streaming generation (GPU)
  • Speed: Real-time on GPU, slower on CPU
  • Memory: ~7GB GPU RAM (FP16), ~14GB (FP32)
  • Sample Rate: 24kHz (high quality audio)

Model Architecture

Continue-TTS is built on the Continue-1-OSS and combines:

  • Base Model: Continue-1-OSS (LLaMA-based, 3.3B parameters)
  • Audio Codec: SNAC multi-scale neural audio codec
  • Token Structure: 7 audio tokens per frame (hierarchical encoding)
  • Training: Fine-tuned on few hours of diverse speech data

The model generates audio tokens autoregressively, which are then decoded into waveforms using the SNAC neural codec.

Training

Continue-TTS was fine-tuned on the Continue-1-OSS using:

  • High-quality speech datasets covering diverse accents and styles
  • Multi-speaker recordings for voice diversity
  • Emotional speech data for expressive synthesis
  • Conversational and narrative content

Training utilized:

  • Continue-1-OSS as base
  • Custom tokenizer with 28,672 audio tokens
  • Multi-stage training (pretraining + fine-tuning)
  • Optimized for naturalness and emotion

Limitations

As with any TTS model, Continue-TTS has certain limitations:

  • Pronunciation: May struggle with unusual names, technical terms, or non-English words
  • Consistency: Long-form generation may have minor quality variations
  • Accents: Primarily trained on specific accent patterns
  • Compute: Requires GPU for real-time generation (CPU is slower)
  • Language: Currently optimized for English

Ethical Considerations

SVECTOR is committed to responsible AI development. Users should:

  • Transparency: Disclose when audio is AI-generated
  • Consent: Do not clone voices without explicit permission
  • Verification: Implement safeguards against deepfakes and misinformation
  • Attribution: Credit the model when used in public projects
  • Responsible Use: Avoid generating harmful, deceptive, or illegal content

License

This model is released under the Apache License 2.0. See the LICENSE file for complete details.

Acknowledgments

Continue-1-OSS builds upon advances in neural speech synthesis, large language models, and neural audio codecs. We thank the open-source community for their contributions to these foundational technologies.


Developed by SVECTOR

Downloads last month
58
Safetensors
Model size
4B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support

Collection including SVECTOR-CORPORATION/Continue-TTS