Deepgram
deepgram.comBuild Difficulty: 5/5
Build a working replacement in a weekend with AI tools
Enterprise Voice AI: STT, TTS & Agent APIs
How to Replace DeepgramOverview
Features
31 features across 11 categories
AI Model(6)
Recommended for large transcription volumes with high accuracy timestamps.
Lower word error rates than Base with high accuracy timestamps and keyword boosting capabilities.
Conversational speech recognition for real-time voice agents with built-in turn detection, natural interruption handling, and ultra-low latency.
Transcription models recommended for non-English transcription use cases.
Highest performing transcription model recommended for most use cases, especially audio with multiple languages, background noise, crosstalk and far field audio.
Highest performing model supporting multiple languages, recommended for diverse audio environments including background noise and crosstalk.
API(1)
Cloud-based Whisper model integration for transcription.
API Type(2)
HTTP-based API for speech-to-text, text-to-speech, and audio intelligence.
WebSocket protocol for real-time streaming and low-latency applications.
Analytics(1)
Extract actionable insights from conversational audio and text at scale.
Audio Intelligence(4)
Recognize user intent from conversational audio.
Analyze sentiment in conversational audio and text.
Automatically summarize audio and text content.
Automatically detect topics in conversational audio.
Core API(4)
Integrated language model orchestration within the Voice Agent API.
Convert audio to text with high accuracy. Supports 45+ languages with Nova models.
Generate natural, low-latency speech for voice assistants and conversational AI applications using Aura models.
Real-time conversational AI agents that handle interruptions, take complex actions, and deliver natural responses with built-in turn detection.
Deployment(4)
Pre-recorded audio processing for bulk transcription jobs.
Deploy Deepgram APIs on Deepgram's managed cloud infrastructure.
Deploy Deepgram on your own infrastructure for complete control.
Real-time streaming audio processing with low latency.
Enterprise(1)
Build and deploy custom voice AI models for enterprise-specific workflows.
Program(1)
Opt-in program to help improve Deepgram models with your usage data.
STT Add-on(5)
Automatically detect the language of incoming audio.
Boost accuracy for specific domain-specific jargon, product names, or acronyms important to your use case.
Automatically identify and remove sensitive PII such as social security numbers, credit cards, and phone numbers.
Automatically format punctuation, casing, dates, and currency for readability.
Detect multiple speakers and label who spoke when in the transcript.
TTS Model(2)
Text-to-speech model for voice assistants and conversational AI applications.
Natural, low-latency text-to-speech model for voice applications.
Pricing
Free
- ✓Access all endpoints in public models
- ✓Up to 100 concurrent connections for STT REST API
- ✓Up to 150 concurrent connections for STT WSS API
- ✓Up to 5 concurrent connections for Deepgram Whisper Cloud
- ✓Up to 45 concurrent connections for TTS REST API + WSS API
- ✓Up to 45 concurrent connections for Voice Agent API WSS
- ✓Up to 10 concurrent connections for Audio Intelligence REST API
- ✓Community & Discord support
- ✓Standard uptime SLA
- ✓No credit card required
Pay As You Go
- ✓Access all endpoints in public models
- ✓STT Flux: $0.0077/min
- ✓STT Nova-3 (Monolingual): $0.0077/min
- ✓STT Nova-3 (Multilingual): $0.0092/min
- ✓STT Nova-1 & 2: $0.0058/min
- ✓STT Enhanced: $0.0165/min
- ✓STT Base: $0.0145/min
- ✓TTS Aura-2: $0.030/1k characters
- ✓TTS Aura-1: $0.0150/1k characters
- ✓Voice Agent Standard: $0.0800/min
- ✓Voice Agent Standard - BYO TTS: $0.0600/min
- ✓Voice Agent Custom - BYO LLM: $0.0700/min
- ✓Voice Agent Custom - BYO LLM + TTS: $0.0500/min
- ✓Voice Agent Advanced: $0.1600/min
- ✓Voice Agent Advanced - BYO TTS: $0.1200/min
- ✓Audio Intelligence Summarization: $0.0003/1k input tokens - $0.0006/1k output tokens
- ✓Redaction add-on: $0.0020/min
- ✓Keyterm Prompting add-on: $0.0013/min
- ✓Speaker Diarization add-on: $0.0020/min
- ✓Smart Formatting: Included
Growth
- ✓Access all endpoints in public models
- ✓Up to 20% savings with pre-paid credits
- ✓Up to 100 concurrent connections for STT REST API
- ✓Up to 225 concurrent connections for STT WSS API
- ✓Up to 5 concurrent connections for Deepgram Whisper Cloud
- ✓Up to 60 concurrent connections for TTS REST API + WSS API
- ✓Up to 60 concurrent connections for Voice Agent API WSS
- ✓Up to 10 concurrent connections for Audio Intelligence REST API
- ✓STT Flux: $0.0065/min
- ✓STT Nova-3 (Monolingual): $0.0065/min
- ✓STT Nova-3 (Multilingual): $0.0078/min
- ✓STT Nova-1 & 2: $0.0047/min
- ✓STT Enhanced: $0.0136/min
- ✓STT Base: $0.0105/min
- ✓TTS Aura-2: $0.027/1k characters
- ✓TTS Aura-1: $0.0135/1k characters
- ✓Voice Agent Standard: $0.0700/min
- ✓Voice Agent Standard - BYO TTS: $0.0500/min
- ✓Voice Agent Custom - BYO LLM: $0.0600/min
- ✓Voice Agent Custom - BYO LLM + TTS: $0.0400/min
- ✓Voice Agent Advanced: $0.1500/min
- ✓Voice Agent Advanced - BYO TTS: $0.1100/min
- ✓Audio Intelligence Summarization: $0.00024/1k input tokens - $0.00048/1k output tokens
- ✓Redaction add-on: $0.0017/min
- ✓Keyterm Prompting add-on: $0.0012/min
- ✓Speaker Diarization add-on: $0.0017/min
- ✓Smart Formatting: Included
- ✓Community & Discord support
- ✓Standard uptime SLA
Enterprise
- ✓Custom pricing for large volumes
- ✓Custom model development
- ✓Self-hosted deployment options
- ✓Dedicated support
- ✓Custom SLAs
- ✓HIPAA compliance with Business Associate Agreements
- ✓Data residency options
- ✓Advanced security features
Cost Calculator
Pricing data not available for Deepgram. Check their website for current pricing.
Build vs Buy
Should you build a Deepgram alternative or buy the subscription? Estimate based on 31 features.
Buy Deepgram
Better ValueBuild Your Own
Buying Deepgram saves ~$36,960 over 3 years vs building.
Estimates based on 31 features and a BuildScore of 5/5. Actual costs vary.