Use Speak AI with Twilio
Pipe Twilio Voice recordings into Speak AI for transcription, AI summaries, sentiment, and shareable insights. Production-ready in under 30 minutes with a 20-line middleware, or zero code via Zapier.
What you can do
Once Twilio recordings flow into Speak AI, every call becomes searchable, analyzable, and shareable. Sales, support, and contact-center teams use the same pipeline for outbound, inbound, conference, and Flex agent calls.
Auto-transcribe every Twilio call
Wire Twilio’s recordingStatusCallback to a small middleware. Completed recordings flow to Speak with speaker labels, timestamps, and 100+ language support. No manual download, no fixed-template licensing.
Run AI analysis on every recording
Topics, sentiment, keywords, named entities, custom Magic Prompts. Speak runs 80+ analysis tools on every Twilio recording so your CRM, dashboards, and BI tools always have rich structured data, not raw text.
Surface call insights inside Twilio Flex
Drop a custom panel into the Flex agent desktop showing the Speak summary, sentiment trend, and topics for the call that just ended. Every wrap-up becomes a coaching loop without leaving Twilio.
Search every call from Claude or ChatGPT
Speak’s official MCP server lets AI assistants search across your full call library. Ask Claude to surface every objection your team heard last week, or pull verbatim quotes for a QBR slide.
Set up in 3 steps
Pick the path that fits your team. Zapier for SMB and operators. Webhook + REST API for production sales pipelines and Flex. Twilio Studio HTTP widget for visual flows.
Sign up for Speak AI
Create a free account at app.speakai.co. You get a 7-day trial with full access. No credit card needed. Once you are in, go to Settings > API and copy your API key.
Pick your integration path
Use Speak AI’s Zapier app at zapier.com/apps/speak-ai. Trigger on Twilio New Recording, action with Speak AI Upload Media. Five minutes, zero code.
Configure Twilio’s recordingStatusCallback to a small middleware (sample below). Forward the Twilio recording URL to POST https://api.speakai.co/v1/media/upload. Production-ready in under 30 minutes.
For visual workflow builders. Drop an HTTP Request widget after the Record widget. Production setups still need a small middleware to attach Twilio’s Basic auth credentials, since the Studio widget cannot embed an Auth Token in templated URLs.
Already have Twilio recordings in Speak? Connect Claude Desktop with npx @speakai/mcp-server init, or add the remote MCP server to Claude.ai. Your AI assistant can now search, summarize, and analyze your Twilio call library through conversation.
Subscribe to media.analyzed
Speak fires a signed webhook when transcription and AI analysis complete (usually within 60 seconds for a 10-minute call). Register your endpoint via POST /v1/webhook and act on transcripts as they land in your CRM, BI, or alerting system.
Real workflows, real results
Four production patterns Speak customers ship with Twilio. Pick the one that fits your team and copy the recipe.
Auto-transcribe every outbound call
Configure Twilio’s Dial verb to record outbound calls. When the call ends, Twilio fires recordingStatusCallback at your middleware, which forwards the recording to Speak. Every call lands in your Speak workspace with transcription, sentiment, and your custom Magic Prompts already running.
<Response>
<Dial record="record-from-answer-dual"
recordingStatusCallback="https://your-middleware.example.com/twilio-recording"
recordingStatusCallbackEvent="completed">
+15551234567
</Dial>
</Response>
curl -X POST https://api.speakai.co/v1/media/upload \
-H "x-speakai-key: $SPEAK_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "Twilio call REc...",
"url": "https://USER:[email protected]/.../Recordings/REc....mp3",
"mediaType": "audio",
"sourceLanguage": "en-US",
"tags": "twilio,outbound"
}'
import express from "express";
const app = express();
app.use(express.urlencoded({ extended: false }));
app.post("/twilio-recording", async (req, res) => {
const { RecordingUrl, RecordingSid, From, To } = req.body;
if (!RecordingUrl) return res.sendStatus(400);
const auth = Buffer
.from(`${process.env.TWILIO_SID}:${process.env.TWILIO_AUTH_TOKEN}`)
.toString("base64");
const upload = await fetch("https://api.speakai.co/v1/media/upload", {
method: "POST",
headers: {
"x-speakai-key": process.env.SPEAK_API_KEY,
"Content-Type": "application/json",
Authorization: `Basic ${auth}`,
},
body: JSON.stringify({
name: `Twilio call ${From} to ${To} (${RecordingSid})`,
url: `${RecordingUrl}.mp3`,
mediaType: "audio",
sourceLanguage: "en-US",
tags: `twilio,${From}`,
}),
});
res.sendStatus(upload.ok ? 200 : 502);
});
app.listen(3000);
import os, requests
from flask import Flask, request
app = Flask(__name__)
@app.post("/twilio-recording")
def twilio_recording():
data = request.form
recording_url = data.get("RecordingUrl")
if not recording_url:
return ("", 400)
upload = requests.post(
"https://api.speakai.co/v1/media/upload",
headers={
"x-speakai-key": os.environ["SPEAK_API_KEY"],
"Content-Type": "application/json",
},
auth=(os.environ["TWILIO_SID"], os.environ["TWILIO_AUTH_TOKEN"]),
json={
"name": f"Twilio call {data.get('From')} to {data.get('To')} ({data.get('RecordingSid')})",
"url": f"{recording_url}.mp3",
"mediaType": "audio",
"sourceLanguage": "en-US",
"tags": f"twilio,{data.get('From')}",
},
timeout=30,
)
return ("", 200 if upload.ok else 502)
media.analyzed webhook// Body Speak posts to your callback (verified 2026-05-07)
// URL also gets ?eventType=media.analyzed&mediaId=<id> appended.
{
"eventType": "media.analyzed",
"state": "processed",
"mediaId": "14c8dd9c0a89"
}
// To get transcript + sentiment + speakers, fetch:
// GET https://api.speakai.co/v1/media/insight/<mediaId>
// (response shape excerpt below)
{
"status": "success",
"data": {
"mediaId": "14c8dd9c0a89",
"name": "Acme Corp - Discovery",
"mediaType": "video",
"duration": { "inSecond": 1265, "start": "00:00:01.280", "end": "00:21:00.995" },
"sourceLanguage": "en-US",
"state": "processed",
"tags": ["meeting-assistant"],
"sentiment": [{
"document": {
"Compound": 25.10,
"Negative": 3.22,
"Neutral": 41.66,
"Positive": 55.10
},
"sentences": [...]
}],
"insight": {
"transcript": [...],
"speakers": [...],
"brands": [...]
},
"mediaUrl": "https://...",
"createdAt": "2026-05-06T17:22:51.269Z"
}
}
Same TwiML pattern works for inbound, outbound, conference, and Flex agent calls.
Score every inbound support call automatically
Capture inbound calls hitting your Twilio Voice number, run topic and sentiment analysis, then score each call on resolution, empathy, and FCR with a Magic Prompt. Multilingual support queues work without per-language setup since Speak supports 100+ languages.
<Response>
<Say>Thanks for calling. This call may be recorded for quality.</Say>
<Dial record="record-from-answer-dual"
recordingStatusCallback="https://your-middleware.example.com/twilio-support">
<Queue>support-queue</Queue>
</Dial>
</Response>
# Magic Prompt is async. Step 1 fires the job; step 2 polls until completed.
curl -X POST https://api.speakai.co/v1/prompt/ \
-H "x-speakai-key: $SPEAK_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"mediaIds": ["86a36b59b38e"],
"prompt": "Score this support call on (1) issue resolution, (2) agent empathy, (3) first-call resolution. Return JSON with scores 0-10 and 1-sentence justifications.",
"isStream": false,
"isIndividualPrompt": true
}'
# Poll every ~1s (typical completion 2-3s):
curl "https://api.speakai.co/v1/prompt/messages?mediaIds=86a36b59b38e&pageSize=1" \
-H "x-speakai-key: $SPEAK_API_KEY"
# -> data.history[0].messages[0] = { state: "completed", answer: "..." }
// Magic Prompt is async: POST starts the job, GET polls for the answer.
async function scoreCall(mediaId) {
const headers = { "x-speakai-key": process.env.SPEAK_API_KEY };
await fetch("https://api.speakai.co/v1/prompt/", {
method: "POST",
headers: { ...headers, "Content-Type": "application/json" },
body: JSON.stringify({
mediaIds: [mediaId],
prompt: "Score this support call on (1) issue resolution, " +
"(2) agent empathy, (3) first-call resolution. " +
"Return JSON with scores 0-10 and 1-sentence justifications.",
isStream: false,
isIndividualPrompt: true,
}),
});
for (let i = 0; i < 20; i++) {
await new Promise(r => setTimeout(r, 1000));
const r = await fetch(
`https://api.speakai.co/v1/prompt/messages?mediaIds=${mediaId}&pageSize=1`,
{ headers }
);
const j = await r.json();
const msg = j?.data?.history?.[0]?.messages?.[0];
if (msg && msg.state === "completed") return msg.answer;
}
return null;
}
Tag inbound recordings with twilio,inbound,support on upload so library views and search filters can split sales vs support metrics.
Surface Speak insights in the Flex agent panel
Twilio Flex is the visual contact-center desktop. After a call ends, drop a custom panel into the agent’s UI showing the Speak summary, sentiment trend, and the topics you care about. Every call becomes a coaching loop without leaving Flex.
- Use the
recordingStatusCallbackflow from tab 1 to get every Flex call into Speak. Store theRecordingSid → Speak mediaIdmapping in your middleware on upload. - Build a Flex plugin (React) that calls
GET /v1/media/insight/:mediaIdwhen the agent opens a wrap-up panel. - Render Speak’s
topics,sentiment.overall, and the Magic Prompt response inline.
// inside your Flex Plugin React component
async function fetchInsights(recordingSid) {
const mediaId = await mappingStore.lookup(recordingSid);
const resp = await fetch(
`https://api.speakai.co/v1/media/insight/${mediaId}`,
{ headers: { "x-speakai-key": SPEAK_API_KEY } }
);
const { topics, sentiment, magicPromptResponse } = await resp.json();
return { topics, sentiment, magicPromptResponse };
}
For larger teams, pair this with Speak’s MCP server (tab 4) so QA leads can ask Claude “show me every call this week where the agent missed the discovery question” without writing SQL.
Search every Twilio call from Claude or ChatGPT
Connect Speak’s official MCP server to Claude or ChatGPT and your team queries the entire Twilio call library through conversation. No SQL, no dashboards, no exports. Already piping recordings into Speak via tabs 1 or 2? You can use this today.
# Claude Desktop / Claude Code (auto-detects your installation)
npx @speakai/mcp-server init
# Paste your Speak API key when prompted. Setup takes about 2 minutes.
# Claude.ai (web) and ChatGPT MCP connector
# Settings > Integrations > Add MCP Server
# Remote URL: https://api.speakai.co/v1/mcp
# Auth header: x-speakai-key: YOUR_SPEAK_API_KEY
# Verify the connection responds (optional sanity check):
curl -s https://api.speakai.co/v1/mcp \
-H "x-speakai-key: $SPEAK_API_KEY" \
-H "Accept: application/json"
2. Example prompts your team can use today:
- “Show me every Twilio call from last week where a prospect mentioned pricing objections. Group by sales rep.”
- “Summarize the 5 longest support calls in my Twilio queue from yesterday. Flag any where sentiment dropped below neutral.”
- “Pull verbatim quotes from Twilio calls tagged
enterprisewhere customers asked about SOC 2 or HIPAA.” - “Compare discovery completeness across our top 3 SDRs. Use this week’s Twilio calls. Score each on a 0-10 rubric.”
- “Find the Twilio call from August 14 where the CEO joined and pull the action items into a follow-up doc.”
Works with Claude.ai, Claude Desktop, Claude Code, and ChatGPT MCP connectors. View MCP server →
Why Speak AI + Twilio
Twilio handles call control. Speak handles transcription and analysis. The combination scales from a 5-person sales team to a global Flex contact center on the same workspace.
Multi-source ingest, not just Twilio
Twilio is one of dozens of inputs. The same Speak workspace ingests Zoom, Teams, Meet, Webex, Vimeo, podcast audio, embedded recorder submissions, file uploads, and live recordings. One library, every conversation.
Custom Magic Prompts, not fixed reports
Twilio Voice Intelligence ships fixed report templates. Speak’s Magic Prompt runs your prompts on every call: discovery checklist scoring, competitor mention extraction, MEDDIC-stage detection, whatever your team needs. Save prompts once, run them on every recording forever.
MCP-native for Claude and ChatGPT
The official @speakai/mcp-server exposes 83 tools to AI assistants. Your team queries the Twilio call library in plain English from Claude Desktop or ChatGPT, without exporting transcripts to other tools.
Shareable embeds, not just internal dashboards
Every Speak transcript and analysis can be shared as a public or private embed. Send a Twilio call summary to a customer’s stakeholder without granting them dashboard access or exporting a PDF.
Teams trust Speak AI for their most important calls
4.9 on G2
“Speak AI has been instrumental in transforming how we handle qualitative data. The transcription accuracy is impressive, and the NLP insights save us hours of manual analysis.”
Research Director | Consulting Firm
“We switched from Otter.ai and the depth of analysis is on another level. Sentiment scoring, keyword extraction, and theme detection all happen automatically.”
Product Manager | SaaS Company
“The ability to search across all our customer calls and pull specific moments is a game-changer for our support team.”
Head of CX | Enterprise Tech
How to use Speak AI with Twilio for call transcription and analysis
Twilio is the dominant programmable communications platform. Sales teams run outbound dialers on Twilio Voice. Support teams route inbound queues through Twilio Flex. Operations teams build voice agents and IVR flows in Twilio Studio. Every one of those calls is a potential source of business intelligence, and every one of them passes through Twilio’s recording layer. Speak AI is what turns those recordings into structured, searchable, shareable data.
Where Speak fits in the Twilio call lifecycle
Speak runs after the call. Twilio handles call control (TwiML, Studio, Flex, Conversations). Speak handles transcription, AI analysis, search, and downstream automation. The handoff happens via Twilio’s recording webhook, the Twilio Studio HTTP widget, or Speak’s Zapier app. Recordings move from Twilio’s storage into your Speak workspace, where transcription begins immediately and AI analysis runs as soon as the transcript is ready.
Speak vs Twilio Voice Intelligence
Twilio Voice Intelligence (formerly Operator) covers basic transcription and a small set of fixed Operators (PII redaction, sentiment, summary). It is built into Twilio’s billing and well-suited if you only need transcription on Twilio calls.
Speak is a different product. Speak ingests Twilio recordings alongside Zoom, Teams, Meet, file uploads, podcasts, and embed recorder submissions, then runs custom analysis (your Magic Prompts, your tags, your team’s vocabulary), exposes everything via MCP for Claude and ChatGPT, and ships shareable embeds. Most customers using both let Twilio handle real-time call control and Voice Intelligence’s PII redaction, then send recordings to Speak for the deeper analysis layer.
Authentication for Twilio recording URLs
Twilio recording URLs are not public. They require HTTP Basic auth using your Twilio Account SID as username and your Auth Token as password. The Node middleware example in the workflows section shows how to attach those credentials when fetching the recording. If you cannot run middleware, use Twilio’s recording redaction or pre-signed URL features and pass the resulting public URL to Speak.
Real-time vs post-call
Speak is post-call. Recordings are processed when they finish, with first transcripts typically available within 60 seconds for a 10-minute call. If you need live during-call transcription (for example, real-time agent assist or live captions), pair Twilio Media Streams with Twilio Voice Intelligence or Deepgram for the real-time leg, and Speak for the post-call analysis layer.
How do I transcribe Twilio call recordings automatically?
Three production paths, ranked by lift:
- Webhook + REST API. 20-line middleware. Twilio’s
recordingStatusCallbackfires when a call ends, your middleware forwards the URL to Speak’sPOST /v1/media/upload. Highest control, lowest cost. - Zapier. Trigger: Twilio New Recording. Action: Speak AI Upload Media. Set up time: 5 minutes. Best for SMB teams without engineering bandwidth.
- Twilio Studio HTTP widget. Visual flow builder. Add an HTTP Request widget after the Record widget. Production setups still need a middleware to attach the Twilio Basic auth credentials.
The same patterns work for inbound calls, outbound calls, conference recordings, voicemails, and Flex agent calls.
Use cases by role
Sales and RevOps teams use Speak AI with Twilio to auto-transcribe every outbound dialer call. Magic Prompts score discovery completeness, surface objections, and extract competitor mentions. See Speak AI for sales teams for the full RevOps stack.
Customer support and CX teams capture every inbound queue call, run sentiment and topic analysis, and feed structured QA scores into their dashboards. Multilingual queues work without per-language setup since Speak supports 100+ languages with automatic detection.
Training and enablement teams turn the Flex agent panel into a coaching loop. Speak’s analysis surfaces missed discovery questions, scripted-line drift, and sentiment dips for QA review. See Speak AI for training and development.
Customer research teams bring Twilio interview calls into the same workspace as their Zoom and embed recorder sessions. Code themes across the entire library, ask Claude for verbatim quotes, build research deliverables in hours instead of weeks. See Speak AI for qualitative researchers.
Frequently asked questions
Does Speak AI offer real-time transcription during a Twilio call?
No. Speak is post-call. Recordings are processed when they finish, with first transcripts typically available within 60 seconds for a 10-minute call. For live during-call transcription, pair Twilio Media Streams with a real-time provider and use Speak for the post-call analysis layer.
Do I need to write code, or can I use Zapier?
Zapier covers the simple Twilio New Recording into Speak Upload flow with zero code. For production sales-call pipelines, Flex agent panels, or multi-step routing, the 20-line Node middleware shown above gives you full control. Both paths use the same Speak workspace and the same analysis layer.
How does Speak handle Twilio’s authenticated recording URLs?
Twilio recording URLs require HTTP Basic auth using your Twilio Account SID and Auth Token. The middleware example above attaches those credentials when forwarding the URL to Speak. If you cannot run middleware, use Twilio’s pre-signed or redacted recording features and pass the resulting URL to Speak.
Is Twilio Voice Intelligence required, or does Speak replace it?
Voice Intelligence is optional. Many customers run it for PII redaction at the Twilio layer and use Speak for the deeper analysis (custom Magic Prompts, MCP access for AI assistants, shareable embeds, multi-source library search). Both can coexist on the same call.
What languages are supported?
Speak supports transcription in 100+ languages including English, Spanish, French, German, Portuguese, Italian, Dutch, Hebrew, Norwegian, Japanese, Arabic, Hindi, and dozens more. Set the source language with the sourceLanguage field on upload, or let Speak detect it automatically.
Can I try Speak AI for free with my Twilio account?
Yes. The 7-day trial includes 30 minutes of transcription, full API access, and the MCP server. No credit card required. Wire up the recording webhook against your sandbox Twilio number and validate the full pipeline before committing.
Start using Speak AI with Twilio today
83 analysis tools. 100+ languages. Production webhook + REST API. Zero-code Zapier path. MCP-native for Claude and ChatGPT.
Try Speak AI free
Create your account, grab your API key, and connect your Twilio recording webhook. Full access for 7 days. No credit card required.
View the API docs
Full reference for the upload endpoint, webhook event types, and Magic Prompt API. Plus the Speak Zapier app and the official MCP server on NPM.





