Office Hours 07 – Transcript Editor & QA, Custom Categories & Insights, Live Speech-to-Text Creation

Tim Tyler Vatsal Lorne Speak Ai Office Hours 04

In This Discussion:

We talked about:
– The Improved Transcript Editor & Quality Assurance Process
– New Powerful Custom Categories & Insights
– Live Colour & Image Speech-to-Text System
Feel inspired every day to work on this with our team.

What Is Speak Ai Office Hours?

The Speak Ai team is doing routine virtual get-togethers that anyone can join! We share updates, have lively discussions, answer questions, and figure out how to solve exciting and complex problems together.

Join us next week at our weekly office hours at 12:00 PM EST.

What Does Speak Ai Do?

Over 540 individuals and teams use Speak to automatically capture, analyze and transform media into incredible assets for growth.

Customers are harnessing Speak’s media capabilities to save time, increase productivity, improve research, optimize well-being, grow search engine rankings and more.

The Speak Ai team is doing routine virtual get-togethers that anyone can join! We share updates, have lively discussions, answer questions, and figure out how to solve exciting and complex problems together.

If You Would Like To Contribute

Sign up for free at (and share your feedback – we value it a lot):

Follow us on LinkedIn:

Follow us on Instagram:

Join our Slack Community

Check out our Public Product Roadmap:

Follow our Indie Hackers profile:

1 – 0:00:00
Alright, recording we’re now getting up to almost on Lauren Stockstill struggled with hands office hours #7 we are missing one of the key team members today. That’s what Cha CTO. Technology leader, IoT magician and all around good guy. He has a wonderful friend who had just moved here from from India and they’re spending some time down at the water in Toronto. So crispy sunny day but also still cold.

So I hope that they are OK and and we’re happy to be here. I was saying that I was prepared to do this completely alone, but I’m blessed to be in the presence of Timothy Ann Warren. Very happy to hear Lauren has his eye is falling off. Saying that and we have a couple of things that we want to talk about.

1 – 0:00:48
This is, I hope this isn’t getting redundant for anyone, but we’re going to start with this transcript editor because it’s all in our mind right now. We had another delivery of transcripts this week and there’s been some significant improvements that we’ve been making both internally within the application, but then also for transcribing team. And then Lawrence happens. Have fun doing some transcripts right now too, as well as some QA and quality assurance. So Timothy, I know this is top of mind for you.

Let’s let you kick it off and let me know if you have any thoughts specifically,

3 – 0:01:17
you want to go through? Sure, Tyler, thank you. It’s good to be here. Every thoughtful service. Um? So I’m leading the development of this transport editor through it’s, it’s essentially.

Been a in app. Enter such as you would start editing the sentence in your on your media transcript that is automatically transcribed already for you with their analysis and. It grown out of this and came to be part of our one Stop solution for your media analysis and. As we’ve started transcribing more and more. Media we need to scale this up an.

Um? We found beautiful platforms to rely on and connected with so many people. So many transcribers. Professionals in their field and. We’re providing them with our with access to this small application inside of one of our one of our products.

Well inside of Speci I but now. Um? It had grown out of it and fraxel just recently. Created a new code base which is. Um? Not gonna say how many codebases we do have,

2 – 0:02:50
but it’s one more minutes. I’ve been, I’ve started it an it’s a totally new part of our product.

3 – 0:03:00
Now we have a completely dedicated application for Transcript Edition an. We do hope that this this will not only enable. Scalably transcrypt hundreds of audio. In less than several days. For our customers, but as well as that will have a a much larger audience of customers from this. Tyler, what would you know?

2 – 0:03:34
That was great I I’m trying.

1 – 0:03:36
I’m trying to think what to add. I just. We had even a wonderful team that we’ve been. I’ve been talking with for years and they have a big qualitative research project coming up and what was really interesting is they want to use our system to help them with the qualitative research process, but they’re actually they actually have some students keep saying. Actually filler word signal. They they have students and trainees that they want to help do the qualitative analysis and that actually includes cleaning up the transcript.

So it was really interesting there is. We’ve actually provided the platform through and through which is help give them the first run through my automated analysis and transcript. Not even having to rely on us to organize the transcribers, they actually have an internal team of students and they can use our product to clean it up to 100% very quickly and efficiently. So that was really exciting. Like like you said, it’s a.

It’s a. It’s a. It’s almost a separate product and amount of work as that has gone into this transcript editor deserves its own codebase. Deserves its own sort of set in a product suite that we actually have an I think you know Lauren and that’s when I talked about this a little bit in our last one, but the first experiences of our transcript editor were completely miserable, so it’s it’s slowly getting better. There’s some really interesting things that I’m finding are different from the embed that we’re sending to our transcription team versus what’s actually in the app. Lauren just touched on one of them, which is the finding replace of all speakers and. You know this has been a really interesting challenge.

Speaker identification is that difficult. Technical tasks and things to complete. So whatever questions is if we want to ensure that for the human augmented transcription that is fully accurate, do we let the speaker identification that’s automatically generated? Do we let it sit there? That was one stage we did, but then we found out we weren’t getting speakers back properly and then we had one where it was replaced.

All if you change one and it was lined up, still didn’t get back accurate speaker labels. And now we’ve gone and made that much more manual process and the accuracy we’ve seen has taken significant jump. So there’s sometimes you can rely on machines. Sometimes you cannot rely on machines and really trying to bridge that human computer symbiosis that we’re trying to trying to create through this system, and again sounds sort of technical and boring, but I find it quite romantic and exciting.

3 – 0:06:01
Lauren being assigned work for past couple of hours.

1 – 0:06:05
How’s the experience learn? Yeah, I feel like. Well, it’s been pretty good thinking that it’s just the minor noise that has to come up every time,

4 – 0:06:14
but I think like what could help would be that like just at the very beginning, when you’re about to start transcribing. Just ask for the ones like oh we have speaker, one speaker two. What would you like to name them as? And then we’ll just switch that all the way through ’cause every time you go through, you know, is that? Like all these names wrong, you switch it and then it’s like, oh, do you wanna switch all of them?

Is like Nope already did that? If I do that now, that’s just going to screw everything up. Yeah, another idea out. As far as like just keeping track of like when changes happen for like the names, ’cause like when you’re transcribing you’re going through like a sea of words constantly so like having to keep an eye out for like 1 one word change like their name changes and that’s wrong. Mixed a bit difficult ’cause I can understand that you can just breeze way past it when you’re on a roll.

So maybe another way we can get around that is to maybe color code each of the individual squares so you’ll notice that. Like maybe it’s don’t make it like super like bright or like super like noticeable ’cause there could be problems with like. Color was support again, color blindness. So maybe you can actually see what’s being written there just because of the colors too intense or whatever, but we can look into that some more just so you can see like when. Ever some like, whenever the speaker changes, you’ll see like, OK, I’d pay attention and make sure that this is the right color at that point.

1 – 0:07:36
So just making sure you’re not paying attention to the little words here and there. Yeah, now that’s beautiful. I like that we’re all becoming our own users of our own product, so we could do user interviews on each other. We talked quickly before this was about a control Z option, which I think would be wonderful. Technically, we haven’t figured out how to accomplish that yet, but would be fantastic. And then the one thing that you’re sort of touching on Lauren that I’m questioning too is like.

There’s a term that comes that appears in the transcript that you change once and you see that you see instances of that same spelling error, or the machine didn’t understand it the first time and there’s 25 instances of that. You shouldn’t really have to change those manually throughout, so how can you like? Do we color code that that’s like, hey, you’ve changed this word before and like a little light highlight of you know, maybe it’s a click and automatically change it. I don’t know if a full automatic replace would be. Two aggressive but some things there are like, I really love the perspective of our team where it’s like relying on technology and humans and then automate as much as we can to make it as efficient and powerful for the actual end end user.

3 – 0:08:47
Yep, Yep, that’s that’s a beautiful thought and it’s a it’s a difficult task as well. For the test from the technical team. It’s the implementation of this is really our thinking, but end customers are the ones who will be using the system and its produces so much impact as well.

1 – 0:09:06
Yeah, and I think maybe we could touch on well we so we got some good insight. It got some good stuff out of there already and again very specific audience who likes hearing about this stuff. But I swear there out there they will enjoy this. I’m wondering why maybe the second part, which Tim hasn’t maybe had as much exposure to but you and I have also been doing some quality assurance, so I’d love to hear any feedback you have on that process. Just I said to you, one thing yesterday which really has helped me is like putting it on 2. Two times speed or 1.5 speed.

And then just basically listening back. And because our transcribers are doing such a great job, there’s minimal error, at least in the ones that are getting completed successfully. And you know, even at two 1.5 to two times speed reviewing, it’s quite, you know, it’s quite easy and then takes a 30 minute down to if I’m doing math right. 15 minutes or kind of thing so that I thought that was a beautiful process. Just one thing that would be nice in it would be an I don’t know how far or if that’s possible, but that we have the auto scroll.

With the grey line over. When you listen back in in the app and you can see what line really easily. Sometimes when I was going too fast, I got lost in a little bit so I had to restart at a spot. Just one beautiful part. That’s different from the in app versus the actual transcript, and better when you start.

When you put your cursor on a line or you start typing. It will stop playing, which is great, especially when you’re listening to it 2 two times. Speed ’cause you’re like. Oh click stop gives you the time to process and then you can make the necessary change.

4 – 0:10:50
I like that a lot too. It would be nice though, when you. Uh. I’m not sure if I can remember how exactly works, but after you click on the line to the actual spot that you want to edit and then you click play, I don’t think it plays back directly from where you stopped it originally. I could be wrong though, maybe I’m just trying to remember something else though,

1 – 0:11:17
but. It’s becoming a blur. He seen a lot of things. Yeah, you know, that’s I think there’s I mean there’s a lot of room for there’s still a lot of room for improvement, but again, I just think of the first experiences and the pain in the suffering that was involved and how much better it is like. I actually, if I have a short video or audio clip, I’m very happy to sit there and edit it myself now because it’s so quick. And maybe there’s some bias, but it gives me joy to do it and more.

And you, like you know you like a writer and you like you’re also do accounting and so like you like saying things. Well done like to me, there’s nothing better than when I’m doing some quick QA. Not quick, thorough QA and like I just can go for like 5 minutes without making a change like that. Makes me so happy. I just feel good.

That means we’ve made a system that was good. The transcriber did an awesome job and we’re advent delivering a final end product that our customer is going to be very happy with. So quite quite exciting. Things are weird when you get older Tim and you get a little older, weird things excite you. You know I never thought I’d be growing up.

Just so excited about things like this, but anything else that we want to add. Transcript Editor, Quality assurance, anything else you guys are thinking about that?

4 – 0:12:34
There’s still like a lot of thought to be done about quality assurance, ’cause really the only thing we can do is just, you know, read through and check and make sure it’s good, but. The others should probably be some sort of process for that. Like maybe there should be more. I’m so on the idea of the whole check Mark idea and that gives you a percentage of how confident you are.

1 – 0:12:54
That’s 100%. Yeah, and the other part is the scale of it. Like I’ve said too, and one of these topics that we might talk about a little bit later if we have time. We spent good points on this crate is like one of my fears is actually like say this salesperson is successful and they they on board an organization and that person that organization has 400 hours per month that they need human transcribed. That’s a lot, Lauren. You don’t have 400 hours in a week, you know. So it’s like you and I cannot do that QA.

So really scaling up our quality assurance mechanism is important, especially if organization is putting their trust in us to switch from a previous transcriber or something that they had used over to us, and then making sure that we’re actually delivering high quality. Like that’s a big responsibility that we’re taking on and we’re going to need to figure out how to do that at a large scale, both in the transcriber perspective and then from a quality assurance perspective.

4 – 0:13:51
Not enough time for all that,

1 – 0:13:53
so we have to hire some more people to. We’ve got a late Joiner. Looks like himself. I don’t know. I saw him run. I saw him grabbing the charger.

I didn’t know he was planning on joining, but that’s great. Yes, so learn will. That’s all you made it.

2 – 0:14:15
Came running back from the water. Look at how many? That sounds like OK. I have to make it somehow, so just from the harbor freight directly sweating. How is the harbor front end was awesome.

5 – 0:14:32
It’s always nice to see near the water. It’s it’s something is there, so it’s always feel good after going there.

2 – 0:14:38
So your friend here still ’cause he got.

1 – 0:14:42
OK, so we were pretty deep on a conversation because it was sort of top of mind transcript editor, transcript, delivery, quality assurance. I was saying, I know we’ve had a couple discussions in the very specific audience who likes these types, but we talk about something more than the technical changes that we’ve made. I don’t know if there’s too much more we want to add, or if there’s just any thoughts that you have. You know, after after this week, or as we move forward, one of the things that I said just last was what happens if, for example, our sales representative is really successful. Brings an organization who has 204 hundred hours of month that they want transcribed.

What happens then so I don’t know if you have any thoughts but well, I’ll let you like to bring any insights in.

5 – 0:15:27
That’s OK, that’s already good point to point in the equation so that the point is like if we if we came across around that many of hours. So we do have the scalable system, so that is not an issue. The more work we probably have to do around finding the resources for the human transcription and we already build that scalable system, I don’t know. We already talked about in that context, but we have that system in terms of even if we, if even if someone dumped in a day at least 5200. Of video yes, we are moving into that direction, but at one point we will have that scalable system that we don’t even have to worry about anything. The system, the software which we are building will manage.

End to end solution for the users and even if the users have, I think we already have in our system. But even if users have any feedback that I think so, here are the places where I think I need some more correction or the speaker management is little off at this 10%. So we sent back to the Transcriber Transcriber update and we will receive back. So what we’re trying to do? I mean, yes, right now we were in sort of the channel about the little bit of manual process for the validation side.

But we are into that direction, which will the whole automated solution for that that many number of hours. So I to be honest, I’m not much worry about even that is 2000 hours or number of hours. I see 2000 to 20,000. That’s what I see by end of the year. That’s that’s a target. So the 200 is it’s 10%

1 – 0:16:59
of what I’m targeting. So that’s what I said. Here. I I look forward to that and I will feel relief. I again I just don’t spend too much more time on this, but just, you know? That that experience that we talked about this with Lauren you and I by just that, the first time we ever did it the first time we ever tried to transcribe something to now. And there’s a lot less suffering going on,

2 – 0:17:22
so I’m very happy about that. So I had I actually come in here. We really only have gotten through one topic alright, which was not we we got.

1 – 0:17:29
We got on this. There’s a couple things that I was I had on a list. I don’t know if you saw which was. I’ve added a couple, you know what I’m just going to jump into this to see what you guys think. Because this is this one is exciting to me and is to me blowing my mind in terms of one second, I gotta switch to make before I do. I’m doing two things at once. I shouldn’t do. This is.

Custom categories, so these. I mean, this is something that we’ve not sharing the right screen. I hope so the other one I’m reading read it so good.

2 – 0:18:02
So this is, I mean,

1 – 0:18:03
this looks like probably way too much data for anyone who’s looking at this screen. But this is one of the things that has been most exciting for myself. But for the whole team and then also for some of our end users who are really trying to do deeper analysis. Of course, there’s the transcription part which we’ve talked about, but we have text, audio and video, and there also is a wonderful ability to extract almost anything that you want out of this media. Anything that’s language driven. So what we’ve been trying to do for.

Along time has been. Building these custom categories that show that give give our users the ability to really surface the things that are valuable to them and so this week with a bunch of research I only want to take about 20 minutes into it, but we had had some of these prebuilt categories. Just a quick look was like the emotions and then what words are connected to emotion that when said when spoken or when written and actually put in text should be grabbed and then displayed back to the user and really built out some wonderful categories here and then in the end. When you look in the insights panel. You can actually start to see, so hate was hate was set in this moment.

Hey, there we go. So hate that was from the last week and really depends on what industry you’re in, what you’re trying to accomplish. You can actually build in your own custom categories, but one of the things that we’re also working on, investors doing a good job at doing this quickly is taking the custom categories that we’ve built building a global set of them and then helping deploy to users very quickly so that they don’t need to build out custom categories that are already high quality. So I’ll stop there for a second. And see if you guys have any anything to add to this very exciting from my perspective.

5 – 0:19:45
Dollar put it turns off power. Finding this category I see the excitement sitting next to him is like as soon as we find some categories is like unique categories he never seen is like yes I found this news segment of their custom categories. That makes me excited and like the beautiful part with our system is like we can deploy across the system within a minutes. Right now it is still the back end process but we are also in the same direction is like oh you want to dump this XYZ category. Just select an you can. You can jump into your account and just on the custom category connected point is like oh what if I already analyzed hundreds of media file and I want to put a new category called X which has like 50 words.

So should I go in each video or audio or text and click on the re analyze? So no. So I mean what we are also working we have in our pipeline is like re analyze all your media from your. Whatever the audio, video and text Anne Anne that can be done within 1/2 an hour depends on obviously the length of number of media, but I think so that is the two connected points.

1 – 0:20:53
What I see with that. Temp MoD anything you’re thinking? Yeah, those are custom categories.

4 – 0:20:59
Does that only appear on your own account then?

1 – 0:21:01
Or is that like updated throat like every account those right now are in the success account so success is sort of the main harbinger. I hope that’s not. Yeah, of all the most up to date version and then for example, a new customer comes in and they say they want emotions and filler words. All we have to do is like that said back end but click and it drops into their account by default is not in there but there might be a I think to me that the stage that we get to is when you log in. You can just hit a checkboxes of what categories you want to import and it will automatically be there.

3 – 0:21:34
Sort of, the same story is something that we’re doing on. Very manually, Tyler is actually looking through websites and read, read, read it whole day just for that Ann. But this manual process results in having these categories, such as emotional ones as well as more technical such as actionable items, but it’s interesting because the emotions are very much a.

3 – 0:22:01
Very unique to every each person. An understanding what are at least in English language. We do consider to be too to be indicative of 1 emotion rather than the other is already a quite a breakthrough in terms of understanding of our users, since just reflecting it back in the insights page on media and. Fleur I mean I’m, I’m quite excited about this as well as the opportunity to. Um? Quantitatively? Say and compare yourself to your previous self.

It’s like a time machine on a Mac. You’re you’re looking back through a file system, how it changed, but if you can, imagine having a morning reflection. Recording it with our audio video recorder every morning or evening and going. And having those regular updates from yourself or your life, your profession, your career, your your just tasks at hand or whatever that you’re you’re doing in your area of expertise. And being able to compare that with.

In in in the timeline. Is is something? Majestic an. Also it’s it’s it’s it’s. It enables you to understand yourself better, but as well As for the system to understand yourself better.

And that means that eventually will will have a. Simbiosys, that’s the second time I’m using this. This word is being used in this. Um? Yeah, in these hours between our users in the speak speaker,

1 – 0:23:46
yeah, I like when Tim gets romantic. Very nice, beautiful Russian love comes out in the passion for the product line. Any thoughts that you have on here long or something before this? I guess there’s one thing that I’m really interested in, which I think you probably have more expertise than I do, but I’m just trying to figure out which is. I had thought and before in 10, maybe you like this is interesting to you, but before I had thought about emotions and things like that, but I had it necessary thought like what came out for me this week was this unsupported attributions. For example, like you know, this is basically when you attribute something to someone, but you don’t make that Attribution concrete, so it’s like research has shown.

But what research like? There’s basically this ability for people to understand logical fallacy’s? Or like breakdown when someone’s having a conversation. If they were, for example, editorializing it, so were they talking objectively, or where they actually doing it in a way that they were adding their own sort of flavor or opinion onto it. And this was sort of just a break, it was just a breakthrough for me of what’s actually possible with these categories.

And what was amazing is that these were defined. Through research and time and understanding of conversations and debates, even the idea of puffery here. Like when you add renowned or brilliant, you’re throwing, you know you’re throwing an individualized subjective meaning onto something, and it’s actually no longer objective. It’s detached from that and it just was. I don’t know. I don’t know what you guys, just a breakthrough for me and what’s actually possible with the analysis and the actual categories that we have within speak.

The thing is unsupported physicians,

3 – 0:25:30
though, is that. It’s actually an introductory course until IELTS test. If anybody comes into the West or Canada in Christ too, or just prepares for an English exam, and I’m sure that in other languages it’s the same story. I’ve just recently passed my IELTS once again got nine on on on, on on part of it, the. Group the unsupported Attribution, says what? We are being taught in the books.

To start areces from. Most people feel the summer is worse than winter and you start the thesis in whatever they are trying to sort of. It’s all of the season. Niles is are prompts. They start from prompts and they give you this sort of an argument to support, disprove, discuss, or or otherwise and.

It’s very weird, but. Yeah, we’re being taught to use unsupported attributions. Is that a good or a bad thing?

1 – 0:26:38
I wish I knew the answer. That’s very interesting. I didn’t know that. I don’t know if Axel looks like he has something to say. I I didn’t know, yeah.

5 – 0:26:46
It’s a very good, interesting like the lens to look at, and when I see the unsupported, the intents and entities, the first thing come to my mind is at the lens of the enterprise users. So for the enterprise users, who is doing more research that can be more interesting because that is they’re they’re doing for some end results. They’re expecting some end result at the end. That can be the research that can be the small project, so that is that. That makes me exciting about looking at the.

Angle of Enterprise user because what happened? For example you are doing interview Tyler. That’s a lot and doing interviewing team is doing interview under the same organization. But when you go to this different people into the different field, how does that come up at the end? How does that look like?

If you are going to conclude as enterprise that number 1 #2 and #3 user research says that. But what is? That is like more scientist says blah blah. But it’s like what is that conclusion look like and how many users says that? I know these are. I know that or someone says that, but like who is that person is so it’s like are they pointing to the same person under the hood of the research or is that the different person every time? So that is maybe interesting and to make create a connections with the people and brands is like are they talking about the same people or brands in the same user interview? When this says that many many many brands do this or do that was like are they talking about the same brands so that if I if I if I see at that angle?

That is very interesting and very questionable and and a lot of things can be explored out of that research.

3 – 0:28:27
We have an application for this too. We were thinking about the entity linking Park the SCO that speak AI could provide is just optimization of this. Indexing of the articles, publishing and push through speaking, I pipeline the ability to say that is it referenced to ask. Is it referenced the ability to monitor for these claims and. Cornerstone cases in language and just make suggestions upon the improvement of your content.

1 – 0:29:01
Is something up. And I I’m just bored and I want to hear Laurence perspective on this. But when the part that I was just thinking of is like I bet if you for example analyze the marketer versus an accountant, what is the puffery percentage used right marketer? Great, everything you know incredible and then we got the account and who’s just objective fact based numbers, right? So there’s some really powerful things there and just. I mean, this was we’re doing the interview process right now.

And after we interview somewhere, uploading into speak and just as. One example, attentiveness is a category that’s now in. So when someone says, I guess, or you know maybe or perhaps or. I mean or. I personally like.

It’s not. There’s not that there’s an assessment made from my perspective, but I can click back in here that moment and it takes me to a moment where there was a sign of uncertainty or unconfident. Actually, I’ve clicked on that a lot and half the time it was me saying it. But you know, very interesting to help you just pick out some things or some moments of interest depending on who you are. The outcome that you’re looking from from an interview or conversation like it just becomes this powerful mechanism, and now that these categories are built in and I had that experience even with their own interviewing this week like they just could not replace this system for that process. So that felt like a really big breakthrough that I’m excited about.

That’s all temp Lauren. Or do you think?

1 – 0:30:29
Is the literary expert here come on line? Yeah, I like how there’s a whole bunch of different orange you can be looking on just based on.

4 – 0:30:39
Like who exactly is looking for it, so like say like you said markers versus account. You’re probably looking for someone making more of appeals to emotions and accounts, or just saying bye within. I would say account really science, but just any sort of what’s the term again. Was that for? Was that four letter acronym for people and in school?

We’re in like science, math stem stem stem degrees. There probably more looking on too. The different aspects of rhetoric that’s been brought before I can like I was thinking that maybe we have too many categories. Maybe it can be. Shrunk down into some way and then categories for categories.

So like there’s those appeals too. Logics, there’s logos, etho’s and patho’s. So when they’re saying like a scientist, CEO studies show, this is kind of like an appeal to logos is an appeal to someone’s idea of logic. And then there’s paths. I kind of get confused between pathogen at those.

I think Patho’s is an appeal to emotion, so as if you’re using flowery words like I loved this or that and this makes me feel this way or that others at those the appeal to ethics, which is probably going to be the most difficult to actually get ahold of. I don’t really can’t think of any words that. You know, is it appeal to ethics? Maybe it’s more like at that point it’s becoming more into like clauses which comes up to that whole that one grant we’re thinking of applying to for the government weather looking more to extract the whole clauses as opposed just words. Um?

But yeah, I’m curious to see. Like who we’re working with, what of these categories there actually most interested in? Because I’m assuming that just like for people, qualitative analysis, probably therapists and whatnot, there mostly mostly concerned with like emotions and stuff like that. I don’t know who would even care about those who don’t care about categories of ethics and thinking what is right and what is wrong?

1 – 0:32:47
I don’t find them.

2 – 0:32:49
Do you like politicians? They would probably be most interested in.

4 – 0:32:53
And swearing someone based on ethics.

1 – 0:32:55
No, I I think so too. And the one experience that I had this week was an organization whose doing mental health research and they’re actually trying to help their the way that they basically said it. Which was amazing was that they’re looking for Epiphanes that users are having. So how do you find Epiphanes within medium large 15 hours of audio or video? How do you find an epiphany and what an amazing question, right like that. So through some of the categories that we’ve built, there’s that possibility, but that’s all.

In the last two weeks, you added the sound waves and so you can be a big spike. You see a big spike in the sound wave and typically that has almost always been laughter or at the counter, which is the sadder version, which is like a anger or someone yelling those spikes. So all of a sudden there are a visual representation of Epiphanes or more likely to find epiphany and then categorical records or instances as well to another one. Last part, Lauren, is what you said about. I completely agree on the subcategorization category level.

Right now we don’t have a hierarchy. Like all those emotions are just equal in that, but there should be an emotion sort of main category within the sub categories of different emotions within it.

4 – 0:34:06
Get to a point where we have way too many categories and and just kind of information overload.

1 – 0:34:11
We don’t really know what we’re looking for is a lot to look at like this already, and even now the bar like the insights panel in our specific account because we have so many categories is like longer than the actual transcript. So one of the things that we’re now getting asked is how can you order so the one company was saying we value these categories over other categories, so we want these to this display first in the insights panel and then Additionally, can we just filter? For example, with the checkbox, the emotion category and all the emotion sub categories, and now you’re only displaying that in this in the insights panel. So it does become a filtering mechanism that’s needed to display only the information that actually matters.

3 – 0:34:52
But that’s the thing. Part where you’re building the relationships between categories and categories themselves can be prioritized over the others. That’s the part where. We all, it’s it’s. It’s the part of patterning patterning between these and the patterns is what defines the concepts in which particular groups of words are present, meaning meaning that. For example, logos, Path’s efforts.

This is a pattern an where you can fit in different groups of words. The for example critical analytical and computational thinking from James Pennebaker is so also pattern and you can fit almost same words or less but slightly different groups of those words into it and and get good results. And it’s a very natural process that humans have had evolved to do to have multiple meanings for every image. Point reference and it’s the relationships between them that define it. So next iteration is to have the patterns upon based on the word, the custom categories that we have now.

5 – 0:36:08
I would love to see the one more layer down because what we’re talking about here is like. The words throughout the audio and video clip we’re not talking about. So for example, if Tyler is the interviewing person X, how many words person X talk about the buffer rewards? Or how many words Tyler talk about those? So what is that comparison look like? If we’re missing that? But once we get once, we get that label, then we can make a pattern about OK, the individual accounts is the one thing, But when we talk about on the enterprise or the those account side, they need the separations of the speaker.

It’s like we do have on the back end, but we still don’t do at that level the the insights categorisation. So on the inside panel we should have the speaker 123 and if they choose let’s a speaker one I don’t want to dig down in the UI side, but like the overall idea is like if you choose speaker one that’s a Tyler then it should so everything what Tyler says in terms of the insights that can be custom categories or that can be anything right on the on the inside panel that can be same as the the sentiment analysis. Or maybe the sentiment analysis comparison side-by-side is like how does that flow goes is like is that depend on the first speaker and the second speaker? If they if they start talking about the negative sentiment, so it’s like, oh speaker two also start, you can see that pattern or that’s going down or that’s going up. So those sort of comparisons between that can be default categories, custom category, sentiment analysis. So the same differentiations on the dashboard two is like oh, these are the words you spoke last month versus this words.

But this is speaker, one speaker,

1 – 0:37:46
two speaker, three speaker for like so yeah yeah the comparison part is just so beautiful like you know us all four of us have this. Can we have this conversation as four of us? And then that’s when you came in later so you have less talk time in less you know output but you know we analyzed at all. We labeled the speakers properly and then we can see again. I’m probably going to be the leader in puffery.

2 – 0:38:09
I don’t know who would be like. There will be some amazing breakdowns that could happen there. Tyler right number one and puffery 25% more puffery per sentence than than Tim Fat,

1 – 0:38:20
swollen 3rd and Lauren at the at the lowest level with the least amount of puffery, right? So there’s like I don’t know.

2 – 0:38:28
I know some of that stuff might just seem fun, but there are real applications to this,

1 – 0:38:33
like in so many ways too. So again, that’s just a fun example,

2 – 0:38:37
and we’ve had people who say I want to start recording conversations with my friends.

1 – 0:38:43
And I could see them getting pure joy out of this and then at a business level, there’s like real data and analysis that can come out that can change business outcomes on a massive scale. So yeah, I just want to add on that thing on the recording,

5 – 0:38:57
said when you say the friends we have a couple of users who sign up this week and I was just doing some analysis on like why they signed up what they’re doing here from where they come from. So I’m just doing a little bit of research and a couple of things I got from that connected Dot here. Is a couple of people who’s just sign up for the embeddable recorder. They’re trying to say that I want to put on up my website. I want to get the the user feedback so it’s like, is there any pattern they can find between the different users and their role with the locations?

So if the people is in Canada they talk about the website, the same website, the same layout, but the perspective and the angle of looking at the same website is different than someone is sitting in the California or someone is sitting in Russia or India or something like that. So how does that change based on even the locations with the same thing?

1 – 0:39:48
Same words, same website? It’s so there’s just two ways I’m thinking of how we looked at ’cause I’ve been thinking about this a lot too, which is does a user self represent? So right now we ask for email and name and email so you can say, but you don’t ask, you don’t see any categories like should there be a dropdown my age, gender, etc. And if you want an embeddable recorder that can grab those qualities, you just put those as inputs and then they self represent. The second option would not be as more as accurate is like using a system like Google Analytics and tying the embeddable recorder and then pulling those those demographic profiles together, and there’s two, and then there’s one other layer which is so fascinating to me. This is hard to articulate fully, but helping the one company with an advertising campaign and doing analytics and they wanted to see how many technology or technophiles had visited their site and so was able to pull that through a Google Analytics report. Now the question that I’m starting to think about is based on categorizations.

Now this is a jump and so the challenge me on this is like can we. If someone mentioned, if we did an interview with someone, could we actually categorize help categorize them in to some of those prebuilt categories? So, like for example, people who are technophiles talk about technology 25% more than someone who’s not a technophile, and through a conversation or through a customer research interview or something, could you actually categorize them into a market audience? Kind of thing. That’s wrong, so I don’t have any thoughts on that that is more,

5 – 0:41:22
I think, sort directions towards finding the user segment if I if I’m. Correct is like finding the user segment. Where does that user particularly fit?

1 – 0:41:32
Is that the product? Like researchers, I’ll just say like Google Analytics or LinkedIn. For example you have you have a, you have a list of 100 industries or skill sets or affinity’s or interest that they have. So I said, wanna listen in LinkedIn and then say your company and you do 100 interviews with 100 different people. Could you then use those interviews to partition them into those categories almost automatically based off analysis of what was said? In it so we just 100 interviews.

25% they use a lot of puffery and their X so they fit into the marketing category, whereas X fits into technology. I don’t. I haven’t heard anything from Windows companies do the interview then they already know that.

5 – 0:42:19
What is their role? So for example, if you are doing with the interview with the Lauren, so we know that that is going to be much more about the real numbers, the real story, or the financial side.

2 – 0:42:30
So isn’t it like because they already know that? There’s like. Yeah, no, and that’s why I’m just.

1 – 0:42:39
I’m just thinking of that as I pulled that report for them and sell all these categories and how valuable it was to this. This this company of them trying to take audiences that they have. No they didn’t know. So this is like you said in a lot of customer research and stuff they have actually personally recruited people who fit categories so they already have those insights. But sometimes those insights coming back where they don’t have those kind of words. So say you had the embeddable recorder, you didn’t have any things picked on for demographics or interest or anything like that, and now. You’re trying to figure out what category they fit into.

How could you make that happen, and how could you make that as quickly and efficiently happen through just analysis of what was said? And again, I think there’s some big leaps there, and probably a really big error rate that is, that is possible, but just something really interesting to think about. I’ll shut up right now. I’ll come back with more. I’ll come back and see me in three weeks I’ll come back.

2 – 0:43:32
In office hours number 10. Have that thought that bored out at 10 Lauren any thoughts on that? I know we’ve been going here for a bit,

1 – 0:43:38
I personally this has been one of my favorite conversations we’ve had yet, so I I’ve been doing this a lot. Anything you think on those two points or anything?

2 – 0:43:46
Tim Moore.

1 – 0:43:48
No, OK alright I’m going to will get one more thing ’cause I’m I’m deep into this and then we can end this and I’m sorry this has some ego part for me and it’s also like you guys hear me talk about this.

2 – 0:44:01
Do it, you know no,

1 – 0:44:02
no, it’s just this this this video, this video that I recorded which I don’t need to do the whole thing. Type I so angry. Sad. This is one of those, so just there’s some context. This is the live present mode and speak, and this was an MVP that that’s on.

I put together over a weekend about a year ago. A year ago and I’ll just take it to manipulate color sound images. And more. Mind blowing.

5 – 0:44:39
This is my favorite part. I got I got sorry, didn’t mean to play it again. I got a lot of feedback and comments back,

1 – 0:44:49
so one of the things first of all was just like. There’s been times where this is sat in our speak platform, an people have looked at it and then like, what the hell is this like?

2 – 0:45:01
Why is this here? What does this do? Is this ever going to?

1 – 0:45:04
You know? It’s almost like this raw unfinished thing in our system and we just because of priorities and things like we have never really spent too much more time refining it. But to me is one of the actual, more powerful opportunities that we have with the foundation that we’ve laid at speak, and I think is something that. Not only can be this powerful content creative content creator tool to create in real time Original content with almost no friction, but it also becomes this self reflection mechanism that can actually help you understand yourself better and heal. And so I just want to take a minute to sit on this. First of all, any thoughts by anyone here on this?

Let me know.

5 – 0:45:48
Make an opportunity here. Just wanted to go a little bit back a year at only and it’s like it looks pretty awesome now, but there are like many technical challenges to fit the real time transcription with the images that is also fit with the colors. You also can play the video clips, it’s all in one solution to do the content creations or some creativity here, but I would love to connect the point with what we have now is the in app recorder. And what we can have probably in couple of months is the link between the embeddable recorder and the present mode. So even if you are doing the present mode, you can put the whole video clip of your whole presentation back to the speak, and you can use in many ways the whole output channel is already there, and the second point is that with the real time transcription plus, the present mode is just the mind way.

2 – 0:46:41
That’s one part. That is you didn’t add.

1 – 0:46:43
That is actually one of the most beautiful things which we can actually even connect this at one point, but it. Now we all connected, but it was somewhat subconscious is also the hyperlinking part, which now that we’ve got a database of entities that we can grab references to and then also images you could basically without having to hardcode or user defined information in retrieve images anmore automatically by understanding the meaning or phrases or keywords which is just. Beautiful like I don’t. It’s it’s really awesome,

5 – 0:47:18
but on another scale that are a complex city and the challenges on the technical side to fit those videos or how to choose if you choose the Trono you have your own image that can be Harbourfront. That is the different email folder and that’s maybe the different emails for team is like how do you pick those particular words? I’m in terminal is very generic but if you go a little bit later down and if you’re talking about a very particular word, a computer or something like that. So it’s like how does that? How does that fit? So it’s like how we can make a call behalf of Lauren.

That or Toronto would be this,

2 – 0:47:55
not what you think. So it’s like how to answer those question is I think so in my views the challenging thing I the one that stuck out to me

1 – 0:48:03
was like like dog because if you have a dog you want to see your own doc. You don’t want to see someone else’s dog. So there’s like the challenge.

1 – 0:48:14
I think there’s that system you can do with a manual override, like words that are really important to you, you’re going to manual override to display what you truly care about. The ones that I will you know that stick in my mind and just to share the screen just a couple of comments that people laughed after after sharing this which was. This was an amazing comment. So like do we lose subtleties through this? What are amazing question? This is right.

So because I posted this and I said happy and it flicks automatically just to yellow, she’s basically saying. Yeah, like are you? Are we losing subtleties of human emotion or of the human experience like what? A powerful question that she actually asked and just something that I wanted to address quickly. Which you know I think is this is important as we talk about Lauren, you’re talking about the ethics of technology that this is something actually a big responsibility.

I actually see here and just simply a quick thing is like happy. Its foot. It switches to, you know yellow very quickly, but depending on the velocity of happiness that was analyzed or understood. There should be a difference in how much yellow was displayed or how quickly the yellow actually appears on the screen. Does it gradually fade in or does it come in at a high velocity?

Depending on what you said. Yeah, that’s just one thought that I had. And you know if you have anything to add to this. Yeah, it seems like these these changes can really be.

4 – 0:49:41
Brought about from like one word, actions really is there anyway that this screen could change based on whole, you know. Again, clauses whole sentence is all ideas. Since I’m just, you know, one one word like like that’s kind of the idea with the whole speak to is that we’re just extracting like one word to bits and then throwing that into the analytics there. Is there any way that we can go a bit farther and get together a whole string of words? And then there’s different meanings based on the different collection of words like one word has one meaning, but then you put 2 words together has a different meaning have.

Also no sentence together hold, hold, hold a whole different type of meeting there.

2 – 0:50:19
So you can look at look at Tim and that’s why we’re automating jumping. Meaning making the same yeah.

3 – 0:50:30
Well, let’s depart. I mean customization, we need to understand the user first. We need to sort of cat. It’s it’s the part where you’re speak becomes part of your life like it needs to know your dog by the name and by the Photo by its face, you know by the paw like it’s you leave a fingerprint. Um, of everything that matters to you, your friends, your people, and it is, you know, social network is just like that. You leave your fingerprint, but we’re we’re raining it as we’ve said to sort of create relationships of what you see in this creative process, because that’s what we’re what speakers really attempting.

To accomplish here is 2 with the presentation mode. This is to enable the creativity but online as you go. You can manage leverage your own speech and create what you are thinking about and we actually should connect the in app recorder and present present mode together potentially. Yeah, but the the idea where your. You’re mentioning things like your dog, like Tyler, that’s Lauren and you’re mentioning the things you’re working up on as well.

I’m I’m working on this task of that task. I’m going this place. I’m going to hike there. I’m going to ski here. It’s all of these different types of entities that we enable our users to.

Um to surface through our named entity recognition and the Insights panel on the text editor or the medium. All of this eventually. Right now it’s quite objective and we’re gathering it from an objective source. Think it’s OK to say that where it’s OK to say that we’re using Wikipedia is as data source to compare against, but. But getting the so getting tapping into, for example your fossa library I have on the Macintosh, and it’s a beautiful photo library application that comes with the system and it categorizes it.

It has a very nice way of presenting the images from summer from this season from this city with these people and groups them, and it presents to use and so this grouping is is if we could tap into. Things like these that already exist in Apple Ecosystem, Google Ecosystem, Microsoft ecosystem that already learn more about you and personalized their content to you. We could personalize your content to you through speak, sort of in it’s it’s. It’s the part where. Any system must be must, must know you so well, it’s it’s the it’s the part of the being.

A second brain to you. Your second brain no, but not, but not somebody else is one.

1 – 0:53:21
You think you’re jumping ahead a little bit too much as far as like.

4 – 0:53:26
Thinking about, you know, say I have a sentence. I say that and then tires is same sentence. But you’re saying that like these system has to be able to do this for entry between different people that will have two different meanings from those two different sentences spoken in the same sentence spoken by two different people. But you know, can we just as a bit not?

2 – 0:53:43
That’s why I’m saying jumping ahead and can we just focus on one side and 1st and then after that we can focus on the individual? You’re deaf and 10 spirit. I love where you’re going with him. For one thing, though, to answer your original question,

1 – 0:53:57
when you can actually program whole sentence is in to speak and use that as a trigger. The problem is if you vary from that sentence, the system is not intelligently and that’s all. No talks more about this with like Alexa Intents basically like you program a sentence, it’s almost hard programmed in, so any deviation from that sentence and you will not get the result that you’re looking for the next level that we need to get to is the intelligent intent system that allows us to say this was. The same meaning and then display the actual image there and just add Tim. Actually, I know that you went a little into the future there, but I actually think that genius because you know if you look at Google, how they’re analyzing the photo library, they have words you type in the words. All the one word shows all the images that represent that word.

So if you have a picture of a mushroom in a picture type in mushroom, it shows all the pictures of mushrooms that can tap into your library and show your favorite mushroom right? So I actually think that’s beautiful. I agree it’s a little out there in a bit, but. I love it so I.

2 – 0:54:57
In fact you though to understand the sentence,

3 – 0:55:00
it is a very fundamental part and the meaning making facile just been alluring to is is what where? Everybody expresses differently, and everybody expresses emotions differently as well, and something that we can do already and we will eventually is too. Have an understanding of the presets of sort of create connections. Create the Maps of meaning. By the words that you say what you say and what you talk about, the most the correlations between entities is something that we are very unique at because we talk about we even use words in same insane patterns all the time. Actually is a pattern of using words, connecting its analytical thinking more or less because you’re trying to actualize the concept and make it a fact or something sort of the grounded and most of the time, and you can express these.

Meanings of words connecting ideas and sort of. The way that you connect the two ideas and use particular word from a particular custom category is we have this data. We just need to build those relationships and have the. Have the opportunity to say that to correlate it with all domestic information that we have, meaning that this sentence means that I’m sad at this part of the day.

2 – 0:56:26
But it doesn’t mean that I’m sad next time of the day.

1 – 0:56:31
That’s all you got to say here.

5 – 0:56:33
I definitely agree team, but the point where we are not having enough is the data of the user on the technical side. I mean, theoretically I 100% agree as your sign, but what we don’t have enough is like what is the definition of dog for you? What is the what are the what, what were you did in last summer? What do you want the notes on speak last summer so that the system is so much intelligent can tell you that team last summer you want to ski or at this place. How how are you? How about this week is like maybe the comparison between when you write a note or record.

A video is like your emotions or your keywords where like this last summer versus today because you went today and you recorded the clip into the speaker, right? So that the future is is always be amazing. And what is the 10 text even with the present mode, finding the custom category, finding the meaning Anne and Lauren. I agree with you about having the two words or having the meanings that sort of the meaning making machine we are working on with. Even with the custom categories and even finding the entity between your nodes and media.

But what we don’t enough. What we don’t have enough these the data about your data about the user, so as as much as you write more on the speaker record, more you are empowering your own system for understanding. That’s a one side of it,

1 – 0:57:55
but even improving on on the other side so. And also done, I know you gotta go so you can. You’re going yeah OK how how are you doing here with their almost done? I have one other point that I think is really important here. It sounds like Timothy has something to say as well. Two things that I’m thinking that is also not happening that that present mode.

That video that I shared look magical, but if you see I was taking very. Or pauses between the words that I was saying and. The reason is because we’re not looking within a sentence to grab a keyword out, you actually need to hit it right at the right time. I need to say Tyler and then make a pause and then say happy if I don’t do that to me.

2 – 0:58:39
OK, that’s alright.

5 – 0:58:41
Two ways right now in existing system, even if you speak my name is Tyler, the system is right now intelligent enough to understand that you said Tyler in the sentence. If you say the complex and if you are too quick, that might mesh. That might miss in the sentence. But if you’re like hey, this is Tyler blah blah. So the system is right now.

Not that intelligent, but 50 percent, 70% probability that it will pick from the sentence right now. But if you’re going too fast as as we normally speak as I do here.

1 – 0:59:11
That is little tricky. Yeah, and then the last part that I was fascinated by done now I swear, is. So if someone says I’m happy no, someone says I’m sad, someone says I’m sad. What do you display do to stay do display blue to represent their mood, or display yellow to cheer them up? So I think this is such an interesting thing. There needs to be a toggle for reflect or contrast or counteract and just.

I just think what an amazing technical concept. But like I don’t know just such a beautiful thing to think about. So that’s my last point. And that actually came up in one of the responses from. People tell.

5 – 0:59:52
Just wanted to add before that when Timothy talk about the the meaning of emotions varies by person to person. When you when you present at that with the happy and that comment, the first comment is like the green or the blue is happy. For me the black is happy for me sometimes right? ’cause the black the different your favorite color is very by person to person. You might like Blue team if you might like Gray or alone leave you wide. So it’s like how you can decide or is like there supposed to be a system.

Is like in right now with what we have is like. What is your understanding of the color and even you talk about the Gray, the gradient or the mixing of the color? How does that? Looks like because the rainbow has like 7 colors but seven colors as the unique characteristics.

1 – 1:00:41
And when you mix that that is just another level of power. Are colorblind also? So I just thought about that.

2 – 1:00:51
Light up. Room, Timothy yeah, don’t even get me too fired up on that.

1 – 1:00:55
Well it’s coming. So this peak present mode that we’re showing on screen in a software interface is a connection point for what will be connections into hardware and be able to do some truly incredible things. See that puffery there truly incredible popular.

2 – 1:01:12
I don’t ever set up video like our last point.

5 – 1:01:15
I think we also do set up video how you can create your own present mode with your own understanding about the colors and pictures and how you can do the same thing. What Tyler said yesterday on LinkedIn and Twitter.

1 – 1:01:26
So now I’m I’m with you and I haven’t actually. It’s such a beautiful point about the colors and how different colors mean different peoples. Profound stuff. So OK, I’m good in that regards, I don’t think there was anything else anything else that anyone has to say before we wrap this up. This is a fun one. I am someone yeah. Tim good line lines.

Gotta get back to some transcribing.

2 – 1:01:51
That’s why you’re good. I use the Flash 15, which bridge fantastic conversations about the transcript.

5 – 1:01:56
Later that the present mode and everything,

1 – 1:01:59
so it’s awesome. Jam. OK. Cool OK guys, great to see all this office hours #7 #8 next week.

2 – 1:02:08
Thank you for joining me. We had ended up with a full crew. I thought it was going to be completely alone so this is great.

1 – 1:02:17
Great, great to see you guys have a good rest today. Thank you.

Share This Post

Subscribe To Our Newsletter

Harness the collective intelligence on our our journey.

More To Explore

Transcript & Analysis Samples

Rebuilding A Flooded $2,000,000 McLaren P1 | Part 8

Interested in Rebuilding A Flooded $2,000,000 McLaren P1 | Part 8? Check out the video and automated transcript from the Speak Ai team for Rebuilding A Flooded $2,000,000 McLaren P1 | Part 8!

Transcript & Analysis Samples

How to Control a Crowd

Interested in How to Control a Crowd? Check out the video and automated transcript from the Speak Ai team for How to Control a Crowd!

Capture. Analyze. Excel.

We’re building technology to help you enhance your life.
Take the next step on your journey today. 

Don’t Miss Out.

Transcribe and analyze your media like never before.

Automatically generate transcripts, captions, insights and reports with intuitive software and APIs.