Marketplace · 30+ models
Speech & Audio Voice in, voice out
Automatic speech recognition, text-to-speech, speaker diarization, audio classification, and music generation — for contact centers, media, and voice products.
Editor's pick
The speech & audio model most teams reach for first.
Whisper Large v3
Multilingual ASR + translation across 99 languages; the open ASR default.
Spec sheet
- Family
- OpenAI
- Parameters
- 1.55B
- License
- MIT
- Status
- Live
- Best for
- Voice in, voice out
- Sits in
- Speech & Audio
Pricing and routing rank visible on InferenceBench. Variants and quantisations appear in the Yobibyte deploy console.
The rest of the lineup
5 more in Speech & Audio. All deployable in one click.
8× faster than Large v3 with only minor accuracy loss; great for streaming.
Multilingual zero-shot voice cloning TTS from a 6-second reference sample.
Speech-to-speech, speech-to-text, and text-to-speech across 100 languages.
Showing 6 of 30+. The full catalog (with quantisations, hardware variants, and per-region pricing) lives in the Yobibyte console.
Quick start
Five lines to your first speech & audio call.
Every model in this category is reachable from the same Yobitel SDK. Swap the model name; the rest of the call shape stays identical. Authenticated via your workspace key.
from yobitel import Inference
# Whisper Large v3 — multilingual ASR
client = Inference(model="openai/whisper-large-v3")
result = client.transcribe(
audio="board_meeting_2025_03_14.wav",
language="en",
diarise=True, # who-spoke-when
word_timestamps=True,
)
for seg in result.segments:
print(f"[{seg.speaker}] {seg.text}")Where teams ship this
Real speech & audio. In production.
Four use cases that customers run today. Pick a model from the lineup above, deploy on Yobibyte, plug it into the surrounding stack. Done.
- 01
Meeting and call transcription
- 02
Voice assistants and IVR
- 03
Contact-center analytics and QA
- 04
Content dubbing and accessibility
Frameworks
Bring what your team already knows
Yobitel handles the serving layer (GPU scheduling, KV cache, autoscaling, request batching) so your team focuses on the model and the product.
Learn about YobibyteExplore the rest
Other categories in the marketplace
Computer Vision
Object detection, image classification, segmentation
NLP & Language
Text generation, translation, sentiment, summarization
Generative AI
Image gen, text gen, code gen, multimodal
Data Analytics
Predictive analytics, forecasting, anomaly detection
Automation & RPA
Process automation, workflow AI, document processing
Industry-Specific
Vertical-specific models by industry
Recommendation
Recommender systems, personalization, content matching
Don't see what you need?
Bring your own model or fine-tune one of ours. Yobitel engineers can sit with your team and ship the right stack.