(226) 777-9535 [email protected]

Frequently Asked Questions

These are the questions we get asked most at Speak Ai. 

When Was Speak Ai Founded?

Speak Ai was founded in January 2019 after several years of conceptualization. 

Why Was Speak Ai Started?

Speak Ai was founded by Tyler Bryden after he spent five years building websites, implementing marketing campaigns, and facilitating workshops with his first company SixFive. 

During that time, Tyler continuously saw the same problems that companies were facing. Many of the clients were trying to get their organization's message out but were struggling deeply. Tyler and his team realized that marketing is just the communication of information. And hopefully, to the right people at the right time. They faced the same challenges:

  1. The organizations didn't know how to measure the results of their work
  2. Organizations didn't know whether to hire contractors, employees, or agencies
  3. Agencies can be extremely expensive and often fail to deliver a return on investment
  4. Creating great content that communicates an organization's story is time-consuming and expensive
  5. Getting discovered and reaching audiences virtually has become more complex and difficult than ever
  6. The applications we use to communicate with our audiences are fragmented 
  7. A lot of people have short attention spans and we only have a small amount of time to communicate effectively
  8. Every person is processing too much information every day
  9. People only want to engage with the things that matter to them
  10. As more audio and video is created, people have to better communicators than ever

With those challenges in mind, Tyler started conceptualizing and then building Speak. He officially incorporated the company in January 2019 after being awarded the Ontario Centres of Excellence SmartStart Seed Fund grant. 

In the period leading up to founding Speak Ai, Tyler was recording longer-form audio and video podcasts. He was then transcribing those pieces of media and embedding it back on his website tylerbryden.com.

He would distribute that content through his WordPress site and an RSS feed. When he did that, he started to notice is search engine rankings skyrocket. A few of the terms were:

"Joe Rogan mac miller"

"turning 26 years old"

"Kids and competitive sports"

He realized how much of an impact the transcription was having on the rankings so he started to look for a solution to transcribe the media he was creating. There were some transcription options but most of them were missing some important features. For example, automatically publishing the transcription to the site, connecting the transcript and media together to make it interactive, extracting insights out of the content to make it easier to navigate, and analytics to see what was working and what wasn't. With that awareness, Tyler continued to move forward with the development of Speak.

Additionally, Tyler was hired to do a talk on "How AI will impact the real estate industry". That was a big talk in front of over 500 real estate agents with completely original content. In order to prepare his speech, Tyler recorded himself on video and would watch back. He would quickly identify the parts that were good and bad. This process also gave Tyler a lightbulb moment where he realized how powerful video and audio analysis is. When the final speech turned out fantastic, he wanted to share this process with others. 

Lastly, for years, after struggling with mental health in his early twenties, Tyler had been documenting his thoughts and moods in software called Evernote. But, what he never received back were insights about himself. He realized there was an opportunity here. If he could build a software that could capture his thoughts and provide insights back he could get a better understanding of himself, his moods, and what he was journalling. With his knowledge of analytics, he saw a beautiful way to self-analyze while journalling privately or while creating content that would be public. 

Tyler connected with Vatsal Shah who had recently graduated from Western University with a Masters in Computer Science. Together, they started building the first version of Speak.

From the beginning, their goal has been to help people and teams understand themselves better and improve communication both online and in the real world. We are taking cutting-edge technology and democratizing it for our users so they can speak from the heart and be heard, whether that is online on search engines or social media networks, or in the room during a presentation. 


Is it Speak or Speak Ai?

Speak Ai is the official incorporation. But, we call our first product Speak. 

Who Is Speak Made For?

Speak is made for both researchers and marketers who are creating, capturing, analyzing, and sharing media. We are learning from both groups and building an intuitive product that helps both types of users dramatically reduce the time, cost, and effectiveness of their work.

How Do You Use Speak?

You can sign up here! For now, you can sign up for free and you pay-per-minute. This is auto-calculated based on the length of your audio and video file and if you are looking for just the transcript or the transcript with insights extracted. 

How Accurate Is Automated Transcription?

With good audio quality and a clear articulate speaker, you can get an 85% to 98% accurate transcription. Poor audio quality, industry-specific terms, and accents can reduce accuracy and speaker identification. Speak will analyze the file and clean up telephony audio or noisy recordings. We continue to improve our technology and increase our automated analysis accuracy.

What File Types Do You Take?

Speak is built for ease-of-use. We are capable of analyzing most popular video files including MP4, QuickTime, FLV, WebM and AVI. We also support mainstream audio files including MP3, FLAC, AAC and WAV.

What Is The Going Rate For Speech-To-Text?

As speech recognition grows, several companies have built speech-to-text technology. Most automated transcription companies range from $0.10 USD to $2.00 USD per minute. We are competitively priced and unlike transcription companies, analyze video or audio which provides additional value through export options. This includes valuable insights like topics, keywords, and brands using our machine learning algorithms. Soon, you will be able to access our automated analysis at any time with our intuitive web and mobile application.

How Do We Receive Our Transcription & Insights?

When you create an account, you can easily upload audio and video files through a web interface. As soon as your transcription is done, you will get an interactive media player. You can navigate your file and edit the media there, or export to a Word Doc (.doc), PDF (.pdf), SRT and VTT. 

Can I Get A Demo?

Of course! Please book a demo with our team here.

Do You Have Good Security?

Yes, we do. We care about this a lot because we process audio and video. We are working towards HIPPA, GDPR Compliance. We use Stripe for payment processing.  

What Problems Does Speak Solve?

It is hard enough to communicate in person. Now we need to do it online too. If we don’t get heard and communicate well, we can’t:

  1. Resonate with customers to drive leads and sales
  2. Share information with team members and stakeholders
  3. Improve public awareness, sentiment and policy
  4. Better interpersonal and professional relationships

Bad communication is damaging personal and professional lives and making organizations fail. Communication is complex and rapidly changing and:


  1. Causes delays or failures to complete projects
  2. Decreases morale and increases stress
  3. Sabotages sales, marketing, and research
How Long Does The Transcription & Analysis Take?

Although it can range depending on how optimized your audio and video files are and how busy our servers are, Speak aims to deliver a 1:1 ratio. A 10-minute video should take 10 minutes to get back after upload. Audio is often much quicker. 

What Is Speak?

Software that helps people and organizations communicate better online and in the real world. Extract deep insights from your communication channels.

  1. Improve communication through Speak insights
  2. Easily navigate to important moments in analyzed files
  3. Export and share analyzed media in multiple ways
  4. Integrate into people and team’s existing workflows


How Does Speak Work?

Seamlessly create, analyze and share audio, video and text. From our web app, you can easily record or import audio, video and text. Speak will instantly analyze the content and generate an interactive widget that you can use yourself or share with others. 

  1. Access your phone and computer camera.
  2. Easily upload audio and video files and links.
  3. Get deep insights your media.
What Features Does Speak Have?

Automated Speech to Text

 Language identification

Speak automatically detects languages and is capable of accurately analyzing multi-lingual audio and video. 

 Automated transcription

Speak give you the ability to easily convert speech to text in 10 languages. With high-quality audio and video, Speak can immediately deliver a time-stamped transcript with up to 98% accuracy.

 Speaker identification

Speak labels and timestamps speakers so you can easily understand who spoke when. 


With Speak, you can easily export your audio and video files into three popular subtitle formats: WebVTT, TTML, or SRT. 

 Automatic Punctuation

Speak automatically punctuates transcriptions like commas, question marks, and periods using our machine learning models.


Immediately translate the transcription and insights into more than 7 languages.

 Video Analysis

 Object identification

Speak automatically detect and labels items (for example person, table, ball, women etc) when they appear in the video. 

Face detection

Speak’s technology detects and displays faces identified in the uploaded video. 

Celebrity identification

Our software automatically recognizes public figures, displays their biography, and allows users to see when they are present in the video. 

Custom face identification

Tag unknown people in your videos. If they are seen again, our technology will automatically recognize them and show where that person is in the video. 

High-Quality Thumbnail extraction

Automatically extract the best face images for thumbnails.

Audio Analysis

Keyword extraction
Find the most prevalent keywords mentioned by speakers in each audio or video file.

Topic inference

Identify the main topics based on speech content in the video or audio file.

Brand mentions

Tracks brand mentions in spoken content or displayed on the screen during videos.

Sentiment analysis

Compare instances of positive and negative sentiments within audio and video content. 

Emotion detection

Identify emotions in analyzed content using words, vocal signals and facial expressions. 

Multi-channel Recognition

In recordings with several people where they are on different channels (like a phone call or video conference), Speak will analyze each channel separately, recognize speakers, and then merge the transcripts so they are accurate. 

Noise reduction

Speak will analyze the file and clean up telephony audio or noisy recordings.

Why Should We Work With Speak Ai?

Talented Team

We are a growing team of talented and experienced marketers, developers, videographers, business people and strategists.

Unique Experience

Each team member brings a unique perspective that differentiates us and establishes competitive barriers.

Radically Efficient

Our team is focused on hyper-growth and has been relentless in building a powerful product as inexpensively and quickly as possible.

Powerful Roadmap

Because of our unique experience we have features that we are building and have been requested by users that are paradigm-shifting.

What Is The Future Of Speak?


Share and embed your content in many valuable ways.


Deep integrations into all your favourite tools.


Stream and create powerful content in real-time.


Expansion of insights provided beautifully through Speak.