Integration

Use Speak AI with Twilio

Pipe Twilio Voice recordings into Speak AI for transcription, AI summaries, sentiment, and shareable insights. Production-ready in under 30 minutes with a 20-line middleware, or zero code via Zapier.

Free 7-day trial. No credit card required. Works with Twilio Voice, Studio, Flex, and Conversations.
RESTAPI + Zapier
17Webhook Events
100+Languages
Freeto Try

Trusted by 250,000+ people and teams

What you can do

Once Twilio recordings flow into Speak AI, every call becomes searchable, analyzable, and shareable. Sales, support, and contact-center teams use the same pipeline for outbound, inbound, conference, and Flex agent calls.

Auto-transcribe every Twilio call

Wire Twilio’s recordingStatusCallback to a small middleware. Completed recordings flow to Speak with speaker labels, timestamps, and 100+ language support. No manual download, no fixed-template licensing.

Run AI analysis on every recording

Topics, sentiment, keywords, named entities, custom Magic Prompts. Speak runs 80+ analysis tools on every Twilio recording so your CRM, dashboards, and BI tools always have rich structured data, not raw text.

Surface call insights inside Twilio Flex

Drop a custom panel into the Flex agent desktop showing the Speak summary, sentiment trend, and topics for the call that just ended. Every wrap-up becomes a coaching loop without leaving Twilio.

Search every call from Claude or ChatGPT

Speak’s official MCP server lets AI assistants search across your full call library. Ask Claude to surface every objection your team heard last week, or pull verbatim quotes for a QBR slide.

Set up in 3 steps

Pick the path that fits your team. Zapier for SMB and operators. Webhook + REST API for production sales pipelines and Flex. Twilio Studio HTTP widget for visual flows.

Sign up for Speak AI

Create a free account at app.speakai.co. You get a 7-day trial with full access. No credit card needed. Once you are in, go to Settings > API and copy your API key.

Pick your integration path

Zapier

Use Speak AI’s Zapier app at zapier.com/apps/speak-ai. Trigger on Twilio New Recording, action with Speak AI Upload Media. Five minutes, zero code.

Webhook + REST API

Configure Twilio’s recordingStatusCallback to a small middleware (sample below). Forward the Twilio recording URL to POST https://api.speakai.co/v1/media/upload. Production-ready in under 30 minutes.

Twilio Studio HTTP widget

For visual workflow builders. Drop an HTTP Request widget after the Record widget. Production setups still need a small middleware to attach Twilio’s Basic auth credentials, since the Studio widget cannot embed an Auth Token in templated URLs.

MCP for Claude and ChatGPT

Already have Twilio recordings in Speak? Connect Claude Desktop with npx @speakai/mcp-server init, or add the remote MCP server to Claude.ai. Your AI assistant can now search, summarize, and analyze your Twilio call library through conversation.

Subscribe to media.analyzed

Speak fires a signed webhook when transcription and AI analysis complete (usually within 60 seconds for a 10-minute call). Register your endpoint via POST /v1/webhook and act on transcripts as they land in your CRM, BI, or alerting system.

Real workflows, real results

Four production patterns Speak customers ship with Twilio. Pick the one that fits your team and copy the recipe.







For sales and RevOps teams · Webhook + REST API

Auto-transcribe every outbound call

Configure Twilio’s Dial verb to record outbound calls. When the call ends, Twilio fires recordingStatusCallback at your middleware, which forwards the recording to Speak. Every call lands in your Speak workspace with transcription, sentiment, and your custom Magic Prompts already running.

1. TwiML on your outbound calls
<Response>
  <Dial record="record-from-answer-dual"
        recordingStatusCallback="https://your-middleware.example.com/twilio-recording"
        recordingStatusCallbackEvent="completed">
    +15551234567
  </Dial>
</Response>
2. Forwarding middleware






curl -X POST https://api.speakai.co/v1/media/upload \
  -H "x-speakai-key: $SPEAK_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Twilio call REc...",
    "url": "https://USER:[email protected]/.../Recordings/REc....mp3",
    "mediaType": "audio",
    "sourceLanguage": "en-US",
    "tags": "twilio,outbound"
  }'
import express from "express";
const app = express();
app.use(express.urlencoded({ extended: false }));

app.post("/twilio-recording", async (req, res) => {
  const { RecordingUrl, RecordingSid, From, To } = req.body;
  if (!RecordingUrl) return res.sendStatus(400);

  const auth = Buffer
    .from(`${process.env.TWILIO_SID}:${process.env.TWILIO_AUTH_TOKEN}`)
    .toString("base64");

  const upload = await fetch("https://api.speakai.co/v1/media/upload", {
    method: "POST",
    headers: {
      "x-speakai-key": process.env.SPEAK_API_KEY,
      "Content-Type": "application/json",
      Authorization: `Basic ${auth}`,
    },
    body: JSON.stringify({
      name: `Twilio call ${From} to ${To} (${RecordingSid})`,
      url: `${RecordingUrl}.mp3`,
      mediaType: "audio",
      sourceLanguage: "en-US",
      tags: `twilio,${From}`,
    }),
  });

  res.sendStatus(upload.ok ? 200 : 502);
});

app.listen(3000);
import os, requests
from flask import Flask, request

app = Flask(__name__)

@app.post("/twilio-recording")
def twilio_recording():
    data = request.form
    recording_url = data.get("RecordingUrl")
    if not recording_url:
        return ("", 400)

    upload = requests.post(
        "https://api.speakai.co/v1/media/upload",
        headers={
            "x-speakai-key": os.environ["SPEAK_API_KEY"],
            "Content-Type": "application/json",
        },
        auth=(os.environ["TWILIO_SID"], os.environ["TWILIO_AUTH_TOKEN"]),
        json={
            "name": f"Twilio call {data.get('From')} to {data.get('To')} ({data.get('RecordingSid')})",
            "url": f"{recording_url}.mp3",
            "mediaType": "audio",
            "sourceLanguage": "en-US",
            "tags": f"twilio,{data.get('From')}",
        },
        timeout=30,
    )
    return ("", 200 if upload.ok else 502)
3. React to media.analyzed webhook
// Body Speak posts to your callback (verified 2026-05-07)
// URL also gets ?eventType=media.analyzed&mediaId=<id> appended.
{
  "eventType": "media.analyzed",
  "state": "processed",
  "mediaId": "14c8dd9c0a89"
}

// To get transcript + sentiment + speakers, fetch:
// GET https://api.speakai.co/v1/media/insight/<mediaId>
// (response shape excerpt below)
{
  "status": "success",
  "data": {
    "mediaId": "14c8dd9c0a89",
    "name": "Acme Corp - Discovery",
    "mediaType": "video",
    "duration": { "inSecond": 1265, "start": "00:00:01.280", "end": "00:21:00.995" },
    "sourceLanguage": "en-US",
    "state": "processed",
    "tags": ["meeting-assistant"],
    "sentiment": [{
      "document": {
        "Compound": 25.10,
        "Negative": 3.22,
        "Neutral": 41.66,
        "Positive": 55.10
      },
      "sentences": [...]
    }],
    "insight": {
      "transcript": [...],
      "speakers": [...],
      "brands": [...]
    },
    "mediaUrl": "https://...",
    "createdAt": "2026-05-06T17:22:51.269Z"
  }
}

Same TwiML pattern works for inbound, outbound, conference, and Flex agent calls.

For customer support and CX teams · Webhook + Magic Prompt

Score every inbound support call automatically

Capture inbound calls hitting your Twilio Voice number, run topic and sentiment analysis, then score each call on resolution, empathy, and FCR with a Magic Prompt. Multilingual support queues work without per-language setup since Speak supports 100+ languages.

1. TwiML on inbound calls
<Response>
  <Say>Thanks for calling. This call may be recorded for quality.</Say>
  <Dial record="record-from-answer-dual"
        recordingStatusCallback="https://your-middleware.example.com/twilio-support">
    <Queue>support-queue</Queue>
  </Dial>
</Response>
2. Run a Magic Prompt for QA scoring




# Magic Prompt is async. Step 1 fires the job; step 2 polls until completed.
curl -X POST https://api.speakai.co/v1/prompt/ \
  -H "x-speakai-key: $SPEAK_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "mediaIds": ["86a36b59b38e"],
    "prompt": "Score this support call on (1) issue resolution, (2) agent empathy, (3) first-call resolution. Return JSON with scores 0-10 and 1-sentence justifications.",
    "isStream": false,
    "isIndividualPrompt": true
  }'

# Poll every ~1s (typical completion 2-3s):
curl "https://api.speakai.co/v1/prompt/messages?mediaIds=86a36b59b38e&pageSize=1" \
  -H "x-speakai-key: $SPEAK_API_KEY"
# -> data.history[0].messages[0] = { state: "completed", answer: "..." }
// Magic Prompt is async: POST starts the job, GET polls for the answer.
async function scoreCall(mediaId) {
  const headers = { "x-speakai-key": process.env.SPEAK_API_KEY };

  await fetch("https://api.speakai.co/v1/prompt/", {
    method: "POST",
    headers: { ...headers, "Content-Type": "application/json" },
    body: JSON.stringify({
      mediaIds: [mediaId],
      prompt: "Score this support call on (1) issue resolution, " +
              "(2) agent empathy, (3) first-call resolution. " +
              "Return JSON with scores 0-10 and 1-sentence justifications.",
      isStream: false,
      isIndividualPrompt: true,
    }),
  });

  for (let i = 0; i < 20; i++) {
    await new Promise(r => setTimeout(r, 1000));
    const r = await fetch(
      `https://api.speakai.co/v1/prompt/messages?mediaIds=${mediaId}&pageSize=1`,
      { headers }
    );
    const j = await r.json();
    const msg = j?.data?.history?.[0]?.messages?.[0];
    if (msg && msg.state === "completed") return msg.answer;
  }
  return null;
}

Tag inbound recordings with twilio,inbound,support on upload so library views and search filters can split sales vs support metrics.

For training and enablement teams · Flex Plugin + REST API

Surface Speak insights in the Flex agent panel

Twilio Flex is the visual contact-center desktop. After a call ends, drop a custom panel into the agent’s UI showing the Speak summary, sentiment trend, and the topics you care about. Every call becomes a coaching loop without leaving Flex.

  1. Use the recordingStatusCallback flow from tab 1 to get every Flex call into Speak. Store the RecordingSid → Speak mediaId mapping in your middleware on upload.
  2. Build a Flex plugin (React) that calls GET /v1/media/insight/:mediaId when the agent opens a wrap-up panel.
  3. Render Speak’s topics, sentiment.overall, and the Magic Prompt response inline.
Flex plugin: fetch insights for the just-ended call
// inside your Flex Plugin React component
async function fetchInsights(recordingSid) {
  const mediaId = await mappingStore.lookup(recordingSid);
  const resp = await fetch(
    `https://api.speakai.co/v1/media/insight/${mediaId}`,
    { headers: { "x-speakai-key": SPEAK_API_KEY } }
  );
  const { topics, sentiment, magicPromptResponse } = await resp.json();
  return { topics, sentiment, magicPromptResponse };
}

For larger teams, pair this with Speak’s MCP server (tab 4) so QA leads can ask Claude “show me every call this week where the agent missed the discovery question” without writing SQL.

For analysts, RevOps, and team leads · Natural language

Search every Twilio call from Claude or ChatGPT

Connect Speak’s official MCP server to Claude or ChatGPT and your team queries the entire Twilio call library through conversation. No SQL, no dashboards, no exports. Already piping recordings into Speak via tabs 1 or 2? You can use this today.

1. Install the MCP server




# Claude Desktop / Claude Code (auto-detects your installation)
npx @speakai/mcp-server init

# Paste your Speak API key when prompted. Setup takes about 2 minutes.
# Claude.ai (web) and ChatGPT MCP connector
# Settings > Integrations > Add MCP Server
# Remote URL:    https://api.speakai.co/v1/mcp
# Auth header:   x-speakai-key: YOUR_SPEAK_API_KEY

# Verify the connection responds (optional sanity check):
curl -s https://api.speakai.co/v1/mcp \
  -H "x-speakai-key: $SPEAK_API_KEY" \
  -H "Accept: application/json"

2. Example prompts your team can use today:

  • “Show me every Twilio call from last week where a prospect mentioned pricing objections. Group by sales rep.”
  • “Summarize the 5 longest support calls in my Twilio queue from yesterday. Flag any where sentiment dropped below neutral.”
  • “Pull verbatim quotes from Twilio calls tagged enterprise where customers asked about SOC 2 or HIPAA.”
  • “Compare discovery completeness across our top 3 SDRs. Use this week’s Twilio calls. Score each on a 0-10 rubric.”
  • “Find the Twilio call from August 14 where the CEO joined and pull the action items into a follow-up doc.”

Works with Claude.ai, Claude Desktop, Claude Code, and ChatGPT MCP connectors. View MCP server →

Why Speak AI + Twilio

Twilio handles call control. Speak handles transcription and analysis. The combination scales from a 5-person sales team to a global Flex contact center on the same workspace.

Multi-source ingest, not just Twilio

Twilio is one of dozens of inputs. The same Speak workspace ingests Zoom, Teams, Meet, Webex, Vimeo, podcast audio, embedded recorder submissions, file uploads, and live recordings. One library, every conversation.

Custom Magic Prompts, not fixed reports

Twilio Voice Intelligence ships fixed report templates. Speak’s Magic Prompt runs your prompts on every call: discovery checklist scoring, competitor mention extraction, MEDDIC-stage detection, whatever your team needs. Save prompts once, run them on every recording forever.

MCP-native for Claude and ChatGPT

The official @speakai/mcp-server exposes 83 tools to AI assistants. Your team queries the Twilio call library in plain English from Claude Desktop or ChatGPT, without exporting transcripts to other tools.

Shareable embeds, not just internal dashboards

Every Speak transcript and analysis can be shared as a public or private embed. Send a Twilio call summary to a customer’s stakeholder without granting them dashboard access or exporting a PDF.

Teams trust Speak AI for their most important calls

★★★★★
4.9 on G2

“Speak AI has been instrumental in transforming how we handle qualitative data. The transcription accuracy is impressive, and the NLP insights save us hours of manual analysis.”

Research Director | Consulting Firm

“We switched from Otter.ai and the depth of analysis is on another level. Sentiment scoring, keyword extraction, and theme detection all happen automatically.”

Product Manager | SaaS Company

“The ability to search across all our customer calls and pull specific moments is a game-changer for our support team.”

Head of CX | Enterprise Tech

How to use Speak AI with Twilio for call transcription and analysis

Twilio is the dominant programmable communications platform. Sales teams run outbound dialers on Twilio Voice. Support teams route inbound queues through Twilio Flex. Operations teams build voice agents and IVR flows in Twilio Studio. Every one of those calls is a potential source of business intelligence, and every one of them passes through Twilio’s recording layer. Speak AI is what turns those recordings into structured, searchable, shareable data.

Where Speak fits in the Twilio call lifecycle

Speak runs after the call. Twilio handles call control (TwiML, Studio, Flex, Conversations). Speak handles transcription, AI analysis, search, and downstream automation. The handoff happens via Twilio’s recording webhook, the Twilio Studio HTTP widget, or Speak’s Zapier app. Recordings move from Twilio’s storage into your Speak workspace, where transcription begins immediately and AI analysis runs as soon as the transcript is ready.

Speak vs Twilio Voice Intelligence

Twilio Voice Intelligence (formerly Operator) covers basic transcription and a small set of fixed Operators (PII redaction, sentiment, summary). It is built into Twilio’s billing and well-suited if you only need transcription on Twilio calls.

Speak is a different product. Speak ingests Twilio recordings alongside Zoom, Teams, Meet, file uploads, podcasts, and embed recorder submissions, then runs custom analysis (your Magic Prompts, your tags, your team’s vocabulary), exposes everything via MCP for Claude and ChatGPT, and ships shareable embeds. Most customers using both let Twilio handle real-time call control and Voice Intelligence’s PII redaction, then send recordings to Speak for the deeper analysis layer.

Authentication for Twilio recording URLs

Twilio recording URLs are not public. They require HTTP Basic auth using your Twilio Account SID as username and your Auth Token as password. The Node middleware example in the workflows section shows how to attach those credentials when fetching the recording. If you cannot run middleware, use Twilio’s recording redaction or pre-signed URL features and pass the resulting public URL to Speak.

Real-time vs post-call

Speak is post-call. Recordings are processed when they finish, with first transcripts typically available within 60 seconds for a 10-minute call. If you need live during-call transcription (for example, real-time agent assist or live captions), pair Twilio Media Streams with Twilio Voice Intelligence or Deepgram for the real-time leg, and Speak for the post-call analysis layer.

How do I transcribe Twilio call recordings automatically?

Three production paths, ranked by lift:

  • Webhook + REST API. 20-line middleware. Twilio’s recordingStatusCallback fires when a call ends, your middleware forwards the URL to Speak’s POST /v1/media/upload. Highest control, lowest cost.
  • Zapier. Trigger: Twilio New Recording. Action: Speak AI Upload Media. Set up time: 5 minutes. Best for SMB teams without engineering bandwidth.
  • Twilio Studio HTTP widget. Visual flow builder. Add an HTTP Request widget after the Record widget. Production setups still need a middleware to attach the Twilio Basic auth credentials.

The same patterns work for inbound calls, outbound calls, conference recordings, voicemails, and Flex agent calls.

Use cases by role

Sales and RevOps teams use Speak AI with Twilio to auto-transcribe every outbound dialer call. Magic Prompts score discovery completeness, surface objections, and extract competitor mentions. See Speak AI for sales teams for the full RevOps stack.

Customer support and CX teams capture every inbound queue call, run sentiment and topic analysis, and feed structured QA scores into their dashboards. Multilingual queues work without per-language setup since Speak supports 100+ languages with automatic detection.

Training and enablement teams turn the Flex agent panel into a coaching loop. Speak’s analysis surfaces missed discovery questions, scripted-line drift, and sentiment dips for QA review. See Speak AI for training and development.

Customer research teams bring Twilio interview calls into the same workspace as their Zoom and embed recorder sessions. Code themes across the entire library, ask Claude for verbatim quotes, build research deliverables in hours instead of weeks. See Speak AI for qualitative researchers.

Frequently asked questions

Does Speak AI offer real-time transcription during a Twilio call?

No. Speak is post-call. Recordings are processed when they finish, with first transcripts typically available within 60 seconds for a 10-minute call. For live during-call transcription, pair Twilio Media Streams with a real-time provider and use Speak for the post-call analysis layer.

Do I need to write code, or can I use Zapier?

Zapier covers the simple Twilio New Recording into Speak Upload flow with zero code. For production sales-call pipelines, Flex agent panels, or multi-step routing, the 20-line Node middleware shown above gives you full control. Both paths use the same Speak workspace and the same analysis layer.

How does Speak handle Twilio’s authenticated recording URLs?

Twilio recording URLs require HTTP Basic auth using your Twilio Account SID and Auth Token. The middleware example above attaches those credentials when forwarding the URL to Speak. If you cannot run middleware, use Twilio’s pre-signed or redacted recording features and pass the resulting URL to Speak.

Is Twilio Voice Intelligence required, or does Speak replace it?

Voice Intelligence is optional. Many customers run it for PII redaction at the Twilio layer and use Speak for the deeper analysis (custom Magic Prompts, MCP access for AI assistants, shareable embeds, multi-source library search). Both can coexist on the same call.

What languages are supported?

Speak supports transcription in 100+ languages including English, Spanish, French, German, Portuguese, Italian, Dutch, Hebrew, Norwegian, Japanese, Arabic, Hindi, and dozens more. Set the source language with the sourceLanguage field on upload, or let Speak detect it automatically.

Can I try Speak AI for free with my Twilio account?

Yes. The 7-day trial includes 30 minutes of transcription, full API access, and the MCP server. No credit card required. Wire up the recording webhook against your sandbox Twilio number and validate the full pipeline before committing.

Start using Speak AI with Twilio today

83 analysis tools. 100+ languages. Production webhook + REST API. Zero-code Zapier path. MCP-native for Claude and ChatGPT.

Try Speak AI free

Create your account, grab your API key, and connect your Twilio recording webhook. Full access for 7 days. No credit card required.

View the API docs

Full reference for the upload endpoint, webhook event types, and Magic Prompt API. Plus the Speak Zapier app and the official MCP server on NPM.