OCI Speech

External

OCI Speech delivers powerful speech-to-text and text-to-speech capabilities on Oracle Cloud Infrastructure, enabling real-time transcription and neural synthesis across multiple languages. Designed for enterprise reliability, it offers features like speaker diarization, profanity filtering, and seamless integration with OCI services, ensuring high accuracy without machine learning expertise. It's essential for media companies, healthcare providers, and call centers seeking secure, scalable audio AI solutions.

Pricing

Starting at USD0.35/moView pricing

CategoryVoice Generation & Conversion

Description

Key capabilities

Speech-to-text transcription (batch and real-time)
Text-to-speech synthesis (neural, human-like voices)
Multilingual support (English, Spanish, Portuguese, German, French, Italian, Hindi + OpenAI Whisper for 57+ languages)

Core use cases

1.Closed captions and media indexing
2.Call analytics and customer feedback analysis
3.Medical dictation and real-time clinical notes
4.Accessibility features with TTS
5.Meeting transcription and diarization

Is OCI Speech Right for You?

Best for

Enterprises using OCI for scalable serverless processing
Digital media and healthcare (captions, dictation)
Meeting transcription and accessibility needs

Not ideal for

Users needing global TTS availability
High-volume or long-duration processing without quota increases
Standalone users outside OCI ecosystem

Standout features

Speaker diarization
Word-level confidence scores
Profanity filters (mask/remove/tag)
Text normalization
Low-latency real-time streaming
Prebuilt models requiring no ML expertise
Security: no customer audio storage for training

Pricing

Greater than 5 Transcription Hours

USD 0.35

First 5 Transcription Hours

USD 0

User Feedback Highlights

Most Praised

High accuracy and low-latency for enterprise use
Seamless integration with OCI services (Object Storage, Language, Generative AI)
Strong security and privacy (encrypted, no audio retention)
Easy to use via APIs, CLI, SDKs without ML expertise
Positive partner feedback (e.g., Kaltura for video subtitling)

Common Complaints

Limited availability for TTS and real-time transcription
Strict limits: 2GB file size, 4-hour duration, 10 concurrent sessions
TTS restricted to US West (Phoenix)
Dependency on OCI ecosystem
Lack of independent user reviews or benchmarks