Marketplace · 30+ models

Speech & Audio Voice in, voice out

Automatic speech recognition, text-to-speech, speaker diarization, audio classification, and music generation — for contact centers, media, and voice products.

The speech & audio model most teams reach for first.

All categories

Whisper Large v3

OpenAI·1.55B params·MIT

Multilingual ASR + translation across 99 languages; the open ASR default.

Deploy on Yobibyte Compare on InferenceBench

Spec sheet

Family: OpenAI
Parameters: 1.55B
License: MIT
Status: Live
Best for: Voice in, voice out
Sits in: Speech & Audio

Pricing and routing rank visible on InferenceBench. Variants and quantisations appear in the Yobibyte deploy console.

The rest of the lineup

5 more in Speech & Audio. All deployable in one click.

Browse all 30+

Model

Family

Params

License

Deploy

Whisper Turbo

8× faster than Large v3 with only minor accuracy loss; great for streaming.

OpenAI

809M

MIT

Deploy

Coqui XTTS v2

Multilingual zero-shot voice cloning TTS from a 6-second reference sample.

Coqui

467M

Coqui Public Model

Deploy

Pyannote 3.1

Speaker diarization pipeline: who spoke when, end-to-end.

Pyannote

—

MIT

Deploy

SeamlessM4T v2

Speech-to-speech, speech-to-text, and text-to-speech across 100 languages.

Five lines to your first speech & audio call.

Every model in this category is reachable from the same Yobitel SDK. Swap the model name; the rest of the call shape stays identical. Authenticated via your workspace key.

Get an API key SDK on GitHub

speech-audio-quickstart.py

PythonTypeScriptcURL

from yobitel import Inference

# Whisper Large v3 — multilingual ASR
client = Inference(model="openai/whisper-large-v3")

result = client.transcribe(
    audio="board_meeting_2025_03_14.wav",
    language="en",
    diarise=True,           # who-spoke-when
    word_timestamps=True,
)

for seg in result.segments:
    print(f"[{seg.speaker}] {seg.text}")

Where teams ship this

Real speech & audio. In production.

Four use cases that customers run today. Pick a model from the lineup above, deploy on Yobibyte, plug it into the surrounding stack. Done.

01
Meeting and call transcription
02
Voice assistants and IVR
03
Contact-center analytics and QA
04
Content dubbing and accessibility

Frameworks

Bring what your team already knows

faster-whisperCoqui TTSPyannoteTriton

Yobitel handles the serving layer (GPU scheduling, KV cache, autoscaling, request batching) so your team focuses on the model and the product.

Learn about Yobibyte

Explore the rest

Other categories in the marketplace

85+

Computer Vision

Object detection, image classification, segmentation

120+

NLP & Language

Text generation, translation, sentiment, summarization

95+

Generative AI

Image gen, text gen, code gen, multimodal

60+

Data Analytics

Predictive analytics, forecasting, anomaly detection

45+

Automation & RPA

Process automation, workflow AI, document processing

50+

Recommendation

Recommender systems, personalization, content matching

Don't see what you need?

Bring your own model or fine-tune one of ours. Yobitel engineers can sit with your team and ship the right stack.

Start Building Contact Sales

from yobitel import Inference # Whisper Large v3 — multilingual ASR client = Inference(model="openai/whisper-large-v3") result = client.transcribe( audio="board_meeting_2025_03_14.wav", language="en", diarise=True, # who-spoke-when word_timestamps=True, ) for seg in result.segments: print(f"[{seg.speaker}] {seg.text}")

Speech & Audio Voice in, voice out

The speech & audio model most teams reach for first.

Whisper Large v3

5 more in Speech & Audio. All deployable in one click.

Five lines to your first speech & audio call.

Real speech & audio. In production.

Bring what your team already knows

Other categories in the marketplace

Computer Vision

NLP & Language

Generative AI

Data Analytics

Automation & RPA

Industry-Specific

Recommendation

Don't see what you need?

Speech & Audio Voice in, voice out

The speech & audio model most teams reach for first.

Whisper Large v3

5 more in Speech & Audio. All deployable in one click.

Five lines to your first speech & audio call.

Real speech & audio. In production.

Bring what your team already knows

Other categories in the marketplace

Computer Vision

NLP & Language

Generative AI

Data Analytics

Automation & RPA

Industry-Specific

Recommendation

Don't see what you need?