Deepgram

deepgram.com
AI & Machine Learning
Weekend Project

Enterprise Voice AI: STT, TTS & Agent APIs

How to Replace Deepgram

Overview

Deepgram provides the most accurate and cost-effective real-time APIs for speech-to-text, text-to-speech, and voice agents. Available in real-time and batch, cloud and self-hosted deployments. The platform unifies speech-to-text, text-to-speech, and LLM orchestration into a single API, reducing complexity, latency, and cost.

Features

31 features across 11 categories

AI Model(6)

Base ModelAI

Recommended for large transcription volumes with high accuracy timestamps.

Enhanced ModelAIPremium

Lower word error rates than Base with high accuracy timestamps and keyword boosting capabilities.

Flux ModelAI

Conversational speech recognition for real-time voice agents with built-in turn detection, natural interruption handling, and ultra-low latency.

Nova-1 & Nova-2AI

Transcription models recommended for non-English transcription use cases.

Nova-3 (Monolingual)AI

Highest performing transcription model recommended for most use cases, especially audio with multiple languages, background noise, crosstalk and far field audio.

Nova-3 (Multilingual)AI

Highest performing model supporting multiple languages, recommended for diverse audio environments including background noise and crosstalk.

Also in: AssemblyAI, DeepL

API(1)

Deepgram Whisper CloudAI

Cloud-based Whisper model integration for transcription.

API Type(2)

REST API

HTTP-based API for speech-to-text, text-to-speech, and audio intelligence.

WebSocket (WSS) API

WebSocket protocol for real-time streaming and low-latency applications.

Analytics(1)

Audio IntelligenceAI

Extract actionable insights from conversational audio and text at scale.

Also in: Hugging Face, Notion, Smartsheet

Audio Intelligence(4)

Intent RecognitionAIPremium

Recognize user intent from conversational audio.

Sentiment AnalysisAIPremium

Analyze sentiment in conversational audio and text.

SummarizationAIPremium

Automatically summarize audio and text content.

Topic DetectionAIPremium

Automatically detect topics in conversational audio.

Core API(4)

LLM OrchestrationAI

Integrated language model orchestration within the Voice Agent API.

Speech-to-Text (STT)AI

Convert audio to text with high accuracy. Supports 45+ languages with Nova models.

Text-to-Speech (TTS)AI

Generate natural, low-latency speech for voice assistants and conversational AI applications using Aura models.

Voice Agent APIAI

Real-time conversational AI agents that handle interruptions, take complex actions, and deliver natural responses with built-in turn detection.

Deployment(4)

Batch Processing

Pre-recorded audio processing for bulk transcription jobs.

Cloud Deployment

Deploy Deepgram APIs on Deepgram's managed cloud infrastructure.

Self-Hosted DeploymentPremium

Deploy Deepgram on your own infrastructure for complete control.

Streaming API

Real-time streaming audio processing with low latency.

Also in: Kubernetes Dashboard, Hugging Face, Bitwarden

Enterprise(1)

Custom ModelsAIPremium

Build and deploy custom voice AI models for enterprise-specific workflows.

Program(1)

Model Improvement Program

Opt-in program to help improve Deepgram models with your usage data.

STT Add-on(5)

Automatic Language DetectionAI

Automatically detect the language of incoming audio.

Keyterm PromptingAIPremium

Boost accuracy for specific domain-specific jargon, product names, or acronyms important to your use case.

RedactionAIPremium

Automatically identify and remove sensitive PII such as social security numbers, credit cards, and phone numbers.

Smart FormattingAI

Automatically format punctuation, casing, dates, and currency for readability.

Speaker DiarizationAIPremium

Detect multiple speakers and label who spoke when in the transcript.

TTS Model(2)

Aura-1 TTS ModelAI

Text-to-speech model for voice assistants and conversational AI applications.

Aura-2 TTS ModelAI

Natural, low-latency text-to-speech model for voice applications.

Pricing

Free

$200 free credit
  • Access all endpoints in public models
  • Up to 100 concurrent connections for STT REST API
  • Up to 150 concurrent connections for STT WSS API
  • Up to 5 concurrent connections for Deepgram Whisper Cloud
  • Up to 45 concurrent connections for TTS REST API + WSS API
  • Up to 45 concurrent connections for Voice Agent API WSS
  • Up to 10 concurrent connections for Audio Intelligence REST API
  • Community & Discord support
  • Standard uptime SLA
  • No credit card required

Pay As You Go

Pay-as-you-go
  • Access all endpoints in public models
  • STT Flux: $0.0077/min
  • STT Nova-3 (Monolingual): $0.0077/min
  • STT Nova-3 (Multilingual): $0.0092/min
  • STT Nova-1 & 2: $0.0058/min
  • STT Enhanced: $0.0165/min
  • STT Base: $0.0145/min
  • TTS Aura-2: $0.030/1k characters
  • TTS Aura-1: $0.0150/1k characters
  • Voice Agent Standard: $0.0800/min
  • Voice Agent Standard - BYO TTS: $0.0600/min
  • Voice Agent Custom - BYO LLM: $0.0700/min
  • Voice Agent Custom - BYO LLM + TTS: $0.0500/min
  • Voice Agent Advanced: $0.1600/min
  • Voice Agent Advanced - BYO TTS: $0.1200/min
  • Audio Intelligence Summarization: $0.0003/1k input tokens - $0.0006/1k output tokens
  • Redaction add-on: $0.0020/min
  • Keyterm Prompting add-on: $0.0013/min
  • Speaker Diarization add-on: $0.0020/min
  • Smart Formatting: Included

Growth

$4k+/year
  • Access all endpoints in public models
  • Up to 20% savings with pre-paid credits
  • Up to 100 concurrent connections for STT REST API
  • Up to 225 concurrent connections for STT WSS API
  • Up to 5 concurrent connections for Deepgram Whisper Cloud
  • Up to 60 concurrent connections for TTS REST API + WSS API
  • Up to 60 concurrent connections for Voice Agent API WSS
  • Up to 10 concurrent connections for Audio Intelligence REST API
  • STT Flux: $0.0065/min
  • STT Nova-3 (Monolingual): $0.0065/min
  • STT Nova-3 (Multilingual): $0.0078/min
  • STT Nova-1 & 2: $0.0047/min
  • STT Enhanced: $0.0136/min
  • STT Base: $0.0105/min
  • TTS Aura-2: $0.027/1k characters
  • TTS Aura-1: $0.0135/1k characters
  • Voice Agent Standard: $0.0700/min
  • Voice Agent Standard - BYO TTS: $0.0500/min
  • Voice Agent Custom - BYO LLM: $0.0600/min
  • Voice Agent Custom - BYO LLM + TTS: $0.0400/min
  • Voice Agent Advanced: $0.1500/min
  • Voice Agent Advanced - BYO TTS: $0.1100/min
  • Audio Intelligence Summarization: $0.00024/1k input tokens - $0.00048/1k output tokens
  • Redaction add-on: $0.0017/min
  • Keyterm Prompting add-on: $0.0012/min
  • Speaker Diarization add-on: $0.0017/min
  • Smart Formatting: Included
  • Community & Discord support
  • Standard uptime SLA

Enterprise

Custom
  • Custom pricing for large volumes
  • Custom model development
  • Self-hosted deployment options
  • Dedicated support
  • Custom SLAs
  • HIPAA compliance with Business Associate Agreements
  • Data residency options
  • Advanced security features

Cost Calculator

Pricing data not available for Deepgram. Check their website for current pricing.

Build vs Buy

Should you build a Deepgram alternative or buy the subscription? Estimate based on 31 features.

Buy Deepgram

Better Value
Monthly costContact Sales
3-year totalVaries
Time to deployDays

Build Your Own

Development cost$24,000
Maintenance$360/mo
3-year total$36,960
Dev time~2 months

Buying Deepgram saves ~$36,960 over 3 years vs building.

Estimates based on 31 features and a BuildScore of 5/5. Actual costs vary.