AI has undeniably allowed for more creative freedom and new forms of expression to flourish. At Resemble AI, they are taking the least served modality, which is audio, and applying deep learning technology to it to create custom AI voices.
In this episode, Ron Levy sits down with its founder and CEO, Zohaib Ahmed, to give us a broader view of Resemble AI—known as the enterprise-ready generative voice AI toolkit. Zohaib talks about the origin of the company along with the growth of AI, what generative AI means, and the transformation of text-to-speech to speech-to-speech. On protecting users, he dives deep into deep fake, regulation, and security. He also discusses machine learning, raising money, and audio watermarking. With tools such as Resemble AI, we are seeing the creative industry take their art to a different level.
Tune in and discover more about how AI is changing the game.
- Resemble AI focuses on creating custom AI voices using proprietary deep learning models for realistic speech synthesis.
- Explore the potential of conversational AI to transform how we access information, potentially replacing conventional search interfaces with natural language interactions.
- Introduces the use of PerTh (Perceptual Threshold) technology to safeguard intellectual property. By embedding inaudible watermarks in audio data, this breakthrough technology enables robust IP protection, aligning with the global trend towards enhanced data security.
- “I find myself to be a creator of sorts and it’s very rewarding for me. It just so happens that I’m creating content within AI at the moment. That’s why I got into programming in the first place.”
- “The art of describing what a video should look like and that process being automated with AI sounds extremely fascinating. We are getting so close to that sci-fi world now where the modality in which we are consuming information is about to change. It’s not going to be traditional. “”
- “The more you create, it’s like you are an artist. The more you write, it sounds like a different voice.”
Listen to the podcast here
Empowering The Future Of AI Voices With Zohaib Ahmed Of Resemble AI
This is Zohaib Ahmed of Resemble AI, leading the edge of generative voice in artificial intelligence. I’m on the Edge of AI, the leading show bringing you authentic voices at the edge of what’s possible.
Here’s what’s to come on this journey. Find out how one audio-based AI company is facilitating a whole new wave of creativity and utility at your command. Find out why our guest is inspired by leaders from corporations as large as Microsoft, as well as outside the system doing independent work. Finally, you might be surprised to find out that this intro is a deepfake created with our host’s permission using our guest’s groundbreaking tech. All this and more, go ahead and take your seat.
I’m Ron Levy, and I will be your captain for this exciting voyage to the Edge of AI. Just like most of you, I have embraced the spirit of exploration and entrepreneurship throughout my whole life. From starting my own business before graduating high school and traversing the world’s most challenging terrains. I have always sought new frontiers and adventure.
I have conquered legal battles and built award-winning homes. Now, I lead a public company dedicated to pushing tech boundaries and unlocking our full potential. Together, we will navigate uncharted territories in AI. The guiding star on this quest is going to be to ask great questions. I will endeavor to do so. Buckle up and get ready to embark on an amazing adventure. Let’s set sail.
This episode features Zohaib Ahmed, the Founder, and CEO of Resemble AI, known as the enterprise-ready generative voice AI toolkit. It’s a mouthful, but it’s a big toolkit that they are building. After identifying an opportunity to create entire movies, TV shows, and video games with AI-generated voices, Zohaib founded Resemble AI with his former roommate and fellow gamer, Saqib Muhammad. They built Resemble AI’s generative voice tech rooted in consent and transparency to allow for more creative freedom and new forms of expression. We are here to learn the rest he’s got going on here.
Zohaib led engineering teams at Magic Leap, Deepen AI, Hipmunk which was later acquired by SAP Concur, and Blackberry. At Hipmunk, he was the lead engineer of the first AI assistant for travel. He built it using modern patented NLP techniques. Zohaib graduated from the University of Toronto with a degree in Computer Science. Resemble AI creates custom AI voices using proprietary deep learning models that produce realistic speech synthesis using text-to-speech and speech-to-text.
It’s going to be great. We have all used text-to-speech at this point and speech-to-text, quite honestly. We are going to go deep and we are going to find out the foundation of how it’s been working and much to the point of this show where it’s going and the amazing impact it’s going to continue to have. Let’s start with some background on Zohaib. When did you get into AI and how did that lead to the creation of Resemble?
Thanks for having me. Resemble started several years ago, but the idea was seeded years before then. Traditionally, if you looked at AI in 2015, it’s baffling, but at that time, deep learning wasn’t convincing per se. The analogy that I like to make is that if you roll back the clock enough, you have telephones which Alexander Graham Bell was putting out. Famously, he was rejected for the idea. Initially, there was a lot of feedback or pushback for, “Why would you use a telephone?” You need to like all this infrastructure. The other person needs a telephone. Believe it or not, AI had a very similar thing. As we were getting started with deep learning in the early 2010s and we started becoming a little bit more mainstream, a lot of the tasks were recommendation engines.
You go to eCommerce or Amazon and it would recommend items. It could tell you the house price. It could do like time series analysis of some sort, but it was not until 2015 to 2016 that we started seeing a new form of AI, its earliest form was generative AI, which is now called generative AI. At that time, the model architecture was called Generative Adversarial Networks.
There were some interesting projects like Deep Dream, which was by Google. Deep Dream made these terrible-looking images that had tons of repetitions, but they look pretty cool and they look like dreams. That was the flicker, at least, of what AI is going to become and what’s going to enable. That is essentially this process of creation. Not just analysis, but creating things.
Resemble took that idea, and then we found the least served modality, which is audio. We thought, “What if you could take the least served modality, which is audio, and then apply deep learning technology to it? What could you enable?” In everything else in the world, you had writing assistance. Microsoft Word was a powerful word processor or text processor. You had Photoshop and Illustrator, so you had image processing. CGI was already a thing.
A lot of the deep learning techniques at that time for those modalities didn’t quite live up to par. One thing that was quite obvious is that if we are doing this show and we have audio streams, there’s no way to edit audio. There’s no way to change what I am saying or fix mistakes after it’s been recorded. That’s where we started Resemble with this idea that, “What if we apply deep learning techniques to audio? How would it allow creators to manipulate, change, or on the fly, edit audio such that it works in their used cases?” It’s not such a painful medium to experience and create content with it. That’s where we started Resemble.
I’m going to try to find this for the audience, just for those who may not have even thought about the words before but would like to get foundational knowledge. To the degree, I get it wrong or you can better it, please do. The term Generative AI. There’s AI and generative AI amongst other things. AI is the ability of a computer to gather information that exists and create something that exists. Whereas generative AI creates something new that didn’t exist before then. Is that accurate?
That’s pretty accurate. Artificial intelligence is a very complicated word. There’s a joke amongst engineers that the hardest problem that you can do is to name something properly. Artificial intelligence is anything from when you are playing Pac-Man and the little enemy or little blob tries to find you. That’s artificial intelligence. It’s a naive form of it, but path-finding is artificial intelligence.
When we refer to AI now, you are correct. Essentially, there are machine learning techniques, which are a subset of artificial intelligence that allow you to do time series analysis. There’s also deep learning, which enhances that even further and it’s able to predict results or generate results based on the data it’s learned from. It’s basically predicting net new results instead of looking at the data and just telling you what it sees.
This breakdown is important. For those of us who didn’t go out and get degrees in this or haven’t worked on it in our whole life, unless you know at least a foundational level of how these things are built, you won’t know what to look for. When you are using AI in any fashion, generative AI, you are like, “I wish it did this or had this,” you will understand where to look for that so you can better it. Ultimately, there’s not one solution. There are more solutions we can never imagine and they are coming out every hour at this point, so it’s great. If you broke it down, how would you say that Resemble allows someone to experience generative voice AI beyond just text-to-speech, which we are used to?
When we started Resemble, our core premise was that text-to-speech had already existed, but it was only utilized for accessibility purposes. That was because the voices weren’t natural enough to use in movies, games, or anything creative per se. The first thing that we focused on and this is the bread and butter of Resemble, is voice cloning as a service.
This idea is that you can give it, give us, or give Resemble unstructured voice data like your own voice and it’s able to reproduce that voice and put a text-to-speech engine on top of it. Then there are layers of that. The text-to-speech engine or this model learns voice characteristics. It learns your pitch, your prosody, your naturalness, your inflections, and intonation. It learns how you roll the Rs, every little nuance that exists in your voice.
You then have a text engine on top where you type in something and the model understands how to correlate what you type versus what you say. The advance of text-to-speech as we have done it from when we started the company until now. We thought about other ways of creating speech. It doesn’t necessarily even have to be text-driven.
We thought about, “What if we just use voice as the input and output?” We introduced something called speech-to-speech, which is essentially, my voice comes in and your voice comes out the other side. We thought that was interesting because you have so much more granular control over the output. What I just did, is stutter, you can’t do that with text-to-speech. It’s difficult to type out ums and uhs, and how elongated they are. It’s painful but it’s much more natural when you just talk and then you get voice coming out the other side.
We introduced that. We introduced language dubbing. Once we have this model that’s learned your voice characteristics, we can allow you to speak other languages. We have this model that’s learned 62 languages at the moment. It’s able to transfer your voice into a different language altogether so you can speak Swahili, African, or Korean.
That opens a whole new section of ideas and boundaries for creators because your audience, I’m assuming a large amount speaks Spanish. I’m assuming there’s a subsection that might speak Spanish fluently and may not be very comfortable with English. It opens up whole new audiences and new areas of development for us.
What you just described is mind-blowing. When you think of the use cases and how to utilize them, there have been sci-fi books and movies about people traveling through the universe. Sci-fi stuff where the language gets translated automatically. That’s what you are talking about. Being able to communicate with someone directly with inflection and immediate translation. It’s amazing. What state is that product in for you?
It’s in production. We have essentially this product being used initially with contact centers out of all folks. Even within entertainment for dubbing purposes within games. It’s out there. One of the exciting things about Resemble is how much we think, plan, and put out, but then how it all gets torn apart because the use cases are so different than we imagine. One of my favorite startup stories is YouTube. YouTube started as a dating website. I couldn’t imagine that it was a dating website when I first started.
What we have learned from YouTube and the path we want to follow is all these tools like dubbing and speech-to-speech, we want to put it into the creator’s hands, and then let them be creative. Let them understand where to apply this technology. Our goal is always to not keep things too academic and to wrap it in some product and get it to folks who are responsible and have ideas that they want to execute and we let them do all the creative work.
We do have to be careful with irresponsible actions. There’s one that came to mind as you were talking when you said you could program it so that I could speak and it would come out as someone else’s voice. There’s some great joke in there about a spouse. I don’t exactly know how to frame that, but you could imagine all the variables that come up with technology like that.
It’s important to deploy this in a safe manner. We have spent a ton of time. AI is so challenging. All of this is groundbreaking stuff. It has so much net positive, but we have to be careful how we deploy this technology. As you said, something can go from a joke to being very serious extremely quickly. The laws don’t quite reflect what is real anymore. You have SNL and you have parodies, but those are clear parodies. You can get away with impersonation. To the audience that’s watching, it’s a parody. When you apply some of these technologies, it may not be that obvious anymore.
For us, it’s been deploying these voices as safely as possible. We only let you clone your voice at the moment. Our paying customers can clone voices to which they have explicit consent. We have built-in things like watermarking and deepfake detection. We have been trying to roll out a whole toolkit, as you said initially, of voice AI tools that not only do the generative portion of things but also make sure that we are rolling it out in a way that’s safe and friendly for everybody.
The way you are operating and the way you described it is so important. It’s what we have found throughout the industry. Most companies that are advancing real projects are going to grow with the adoption of AI and it’s powerful. They are doing similar to what you are talking about. They are very conscious about the damage it can do and trying to protect from that and it’s amazing. In this case, I guess I will say that I’m going into opinion now, it’s going to be very difficult for the government to write rules and regulations around it because of the pace at which it’s being developed and that it changes.
By its very nature, agencies trying to regulate that move, not at that pace, let’s just say, far from it. The better the industry can police itself that way and design counterbalances to bad actors within the industry. I know you have got a deepfake audio detector you worked on. Why don’t you tell me a little bit more about that, how it works, and address it from that standpoint?
You said something interesting there. You said that government regulation, you don’t want to stop innovation from occurring. It’s hard to place laws when you want to do the opposite as well. If you think about it, this is like the web or the internet all over again. The internet first came out. If you recall in the earlier days, every computer recommended you to install some antivirus software. It seemed almost unimaginable at that time to not have McAfee or Norton installed on your computer when you first booted it up. That was the first thing you did was installing antivirus because the internet was highly unregulated.
It was open. It was meant to be open, and that was the beauty of it. You could make it safer by closing it down, but safer is an objective word and it stems innovation and makes it more difficult. You wanted everyone to publish anything on the internet. You wanted as many people to participate as possible. However, with participation, you needed a safe way of doing it. Over time, we graduated from antivirus software, which is now the norm. It’s not like antivirus software has disappeared, it’s just built into your computer. The consumer doesn’t think about it anymore.
All the way to your browser has HTTPS access or SSL. There’s a little padlock on your browser on the left-hand side of the URL. Most browsers, if the website is not secure, will give you a warning. Initially, it was a warning. Now, it’s even more intrusive when it tells you, “You have to understand what you are about to do here. Go to a website that is not behind some secure SSL layer.” With deepfake detection, we went on the same route. We looked at it as the antivirus for AI.
The concept is, “Can we have a general purpose machine-learning model that is able to identify deepfakes, not only from Resemble, but from every other cloud provider out there, every open source repository out there?” We imagined it to work exactly like antivirus software works. We’d have to reinvent the whole wheel. Just like antivirus software has constant updates, you see more things, you have more training data, you train a model, you improve over time, and it catches more vulnerabilities.
Our first goal was to go from a world where there was no protection to a world, there was some protection. At the moment, we achieved two levels of security. About 87% of the time in the wild, we are able to detect audio deepfakes. If we train on certain data sets, for example, if you come to us and say, “I’m concerned about my voice.” We could add your voice to the pool, and 98% of the time, we can identify deepfakes with your voice. Once the AI has seen certain voices or your voice, it does a far better job of figuring out if you are real or you are fake in the wild. That’s a thought process with that.
Your focus on all this is audio. That’s your lane. That’s what you guys are staying in.
That’s the bread and butter. We have collected terabytes worth of audio data in the last several years as we are doing this. We have built various models for pitch tracking and low-coders, which are essentially these models that are able to produce waveforms. We have done things around non-vocalizations. We published academic work around generating coughs, laughs, and all of this non-speech-related stuff. We were well-positioned with all of this data to create the opposite model as well, which is the detection layer.
I’m going to circle back. I want to know more about you. Your resume is impeccable. It’s fantastic and speaks for itself, but you were just like the rest of us, a guy, young, trying to figure out which way he wanted to go and what you wanted to do. What brought you here?
I find myself to be a creator of sorts. The art of creating things resonates and it’s very rewarding for me. It just so happens that I’m creating content within AI at the moment. That’s why I got into programming in the first place. It’s this concept of you type in a piece of code and you see it on the computer and it’s real. It does what you want it to do. It feels like a superpower. A greater superpower than that is web development. It’s this concept, you write something, and all of a sudden, millions of people in the world have access to it. They can see it and they can interact with it. There’s a certain adrenaline rush that occurs when someone clicks on something that you built.There is a certain adrenaline rush that occurs when someone clicks on something that you built. Click To Tweet
I found that early. I would say that when I was in high school, I started getting into web development and started building things. Most of them were utter failures, but at that point, failure wasn’t a word. It was like, I didn’t build things at that time to generate revenue. It wasn’t like you have folks now who are starting development with the goal of creating revenue streams for themselves. At that time, people play video games. This is just like a video game for me. I type something. I instruct this computer to do something. It does it.
In the earlier days, some of the things in high school, there was a website. I remember one of the first websites I created was a news aggregator. In 2007, around that time, 2006, news aggregators were the hot thing to do. Google is just getting started, but everything is a news aggregator and everything is moving in that direction. A lot of early programming for me was a monkey see, monkey do type of thing. I got to college in the first year, I built a Groupon clone that was for college students.
Essentially, you take the thing that everyone is gravitating towards. I was like, “What if I just take this? Can I just rebuild it myself? If I have this audience, can they use it?” A lot of it was just a creative process. With AI and machine learning, the more interesting things there are, the more you create, it’s like you are an artist. It’s the same logic. The more you draw and the more you write, you are more drawn toward things that don’t look traditional anymore.
The interesting part when you read and when you write a lot is probably writing that doesn’t sound like you are writing. It sounds like a different voice. AI and ML were very different in that perspective. I went from this area of creating web and mobile applications to this area of, “Here’s a piece of software that’s data-driven.” It takes this arbitrary amount of data, it learns, and then it’s able to produce results.
When you code a lot, you have a lot of this logic that you write out. If this thing happens and you see these patterns, then it’s this. What you realize is that the clicking point for me with deep learning and machine learning was this concept of learned algorithms. These models started with a seed, and then just learned from there by itself. I think that was impactful and interesting to me.
My first crack at machine learning was in 2015 with the AI travel assistant. I would say that was a pretty big failure. I had thought that it would be relatively easy to do, and then it was like, “These models need to be written in this language with this framework. They could only run these computers.” First of all, you need powerful computers. Where do you get those from? You then need a lot of data and you are like, “I don’t have a lot of data.”
With all five of those things, you get stuck and that’s where everything goes back to the fundamentals of programming again. You have these five problems. How do you build from the ground up to solve these problems? Generally, just problem-solving and creativity. One thing leads to another. I’m pretty sure Resemble fits right into that realm or that path of creative programming.
Some of my takeaways from all that is, first of all, you started with travel. You are revisiting it now because that voice to a different language voice is a plugin for a travel company in a heartbeat and it’s a pretty big deal. You are going to end up right back at travel. What I got out of everything you just said is you loved the process, you loved learning, you loved asking questions, and you loved taking it to the next step without necessarily having some end goal at any given moment down the line. It’s the process itself for you has been an absolute joy. When I look at the years, you mentioned 2007. As you said, Google was new back then.
Look what’s happened. Look what we have done. You are with Resemble AI. You are at the first step of what’s next. It’s just brilliant to see and it’s great to see a personality like yours in it and embracing it. I would suggest that emerging techs these days anyway, are not typically people that want to hold things like that. They want to open source. They want to put it out there.
They are not threatened by that. They know everybody is going to contribute. The more that they do, the more opportunities there are. I heard that through and through. It’s what you have just described. It’s amazing. I’m not the only one that’s impressed because Javelin Venture Partners blessed you with their faith in you. Why don’t you talk about that a little bit?
When you build stuff, you are very close to what you are building. It’s very precious, so you want to find the right people to join you and have the same perspective as you. Alex at Javelin. Alex is the partner at Javelin who let the round. The first conversation was on the same page as us in terms of where we are going and what we are heading toward. Creating AI responsibly is the short story of our mutual relationship with Javelin at the moment. I am so glad that I found the right firm or the right partner to lead our Series A. It’s still very early, but they have been phenomenal already.
They understand our perspective and our throughput. They understand the product we are building. They understand it’s so early that we are not quite done yet. We are not even near being done. I don’t think you could ever be done, but we are not even near that stage as of yet. I can’t express my gratitude enough to the team for taking that leap of faith with me as well. It was mutual both ways.
I’m going to put numbers to it because it’s public knowledge anyway, but they led an $8 million round, which you have received. In this world of venture capital and people backing things, you hear about companies getting $300,000 and you hear about companies getting $300 million. It’s all over the map. Every time I see that, I see some entrepreneur that has an idea that is going to their next step. Whether it’s a few $100,000, $100 million, or anything in between, it’s a big step. It’s a pat on the back saying, “We approve of what you have done in the past to get here. We have enough faith in you that we are going to place our money there and think things are going to go right in the future.” It’s a big deal.
$8 million is a super good number. I’m assuming it allows you to head for your next steps. Sometimes, people will say, “That’s the last round I need to do,” and you can grow organically, but sometimes companies grow that way for years and years, and $8 million turns into $12 million turns into $20 million. You know the drill. It’s pretty amazing when that step happens and I can’t pat you on the back enough, especially as glowingly as you spoke about those partners.
As an entrepreneur, at least the general advice that I have gotten is, that it’s important to understand how much capital you need. In the past, there were companies out there raising hundreds of millions of dollars as well. What I have gotten in the way of the naiveness is that the more capital you raise doesn’t necessarily mean success. There’s a threshold where it’s like the point of no return. You just have to be careful and be honest with yourself as to what you can achieve. That makes a healthier partnership altogether.
Without a doubt. All money is not the same money. You can have people write you checks that end up not being great partners, dominating, or sending you in a direction you didn’t want to go. All that can happen, but you can also get great partners that will let you see your vision through and back you all the way. You know that based on your actions. I’m just throwing it out there for the audience that it’s not all built the same.
Those of you that haven’t been down that road before, value yourself when you sit down at that table. That money is not more valuable than you when you are looking at it, even as tempting as it could be sitting there, make sure it’s right for the long run. Just a little personal takeaway on that. How are you using generative AI? I’m not talking about Resemble AI right now, but either you are using it or Resemble is using it, but you are using something from the outside to help at this stage. Tie it to your principles and ethical statements as well, so we can keep it in that vein.
The company and me use generative AI almost on a daily basis. Different tools daily basis. With all of the language models that are out there, they can build some interesting things to automate workflows. Everyone in the company has a piece of software called Copilot. We buy it for all of our engineers. Copilot from a company called GitHub is a piece of software that essentially writes code for you. It’s a salesy one-liner pitch. When I say it writes code for you, it’s like, you got to start writing the code, and it’s smart enough to fill in the rest of the gaps. It’s also great for debugging. Understanding where your code isn’t working properly.
We found it instrumental to keep our velocity high, which is always critical as you are building out a company. Just for yourself personally, you want that feedback. With remote work in general, ChatGPT and LLMs are instrumental. They don’t get seen by this angle quite a bit, but you and I probably will remember one of the most fascinating things about sitting in an office as the fact that you get to talk to other people. You get to say your ideas out loud and hear feedback.
In some cases, there’s this concept called rubber ducking, which is essentially like you have this rubber duck in front of you. You talk to the rubber duck and you solve your question while you are talking to this person. It probably happened to everyone at one stage in their life. They have a problem. They are telling someone about their problem, and then they realize the answer to the problem while they are explaining it to somebody else. I think that generative AI is good at that stuff. It’s good at echoing things back at you. In other words, it’s extremely good at helping you ideate. It’s extremely good at breaking down concepts with you. Day-to-day, that’s a constant thing.Generative AI is extremely good at helping you ideate and break down concepts with you. Click To Tweet
When we use Copilot and we are using various other technologies in this space like the generative AI space, we do try to keep that ethical mindset. We are careful about which LLMs we use. We are handling sensitive data, so it depends on tasks to tasks where generative AI needs to be used. We have a policy on where these technologies should be used and where they shouldn’t be used. Just like you have policies for data handling. If you go work with Amazon and AWS, you hope that Amazon and AWS are storing your data in a safe way. It’s very similar to generative AI tools as well. You don’t want to pollute and cross-pollute between different tasks. Automation is huge. We have found it so useful across the board in the company.
What I love to hear you say is that you are talking about AI security solutions and protecting your customers. That has been constant throughout your language throughout this interview anyway. It just tells you it’s forefront in what and how you design, what mission you are going to go down, and everything else.
As you said, we would assume that we are using AWS that it’s being protected and it’s all correct. This next generation of products, including your own, it’s good to see how much they value that because we all know. You have one big breach, your company is over. Let’s just go from the practical side before we get into the decency side. I think that that’s a big deal. Talk to me about your neural speech AI watermark.
We named it PerTh, Perceptual Threshold. Again, naming is hard. When engineers name things, they have to be creative. The team is particularly proud of naming it after a real city in the world.
Not the easiest one to get to, I might add.
Very difficult to get to. Our initial thought with the watermark and this is very interesting and very transparent as like what we thought, and then what ended up happening, what’s happening right now. We thought of this watermark as a way to detect our deepfakes. Resemble produces things. Can we detect what Resemble produces and can we detect it ourselves?
That was the initial go-to thing, so we came up with this concept of a deep learning model that can inject a watermark into audio. Most of the audio files that you listen to on Spotify already have watermarks in them. They are very naive watermarks. They are very easy to remove. You could sometimes visually see them. They are not very prone to attacks.
Our concept with a deep learning watermark is, can we train a model that knows how to insert this watermark in a way that’s inaudible? You can’t listen to it and you can’t see it. If you try to remove the watermark, it will distort the audio and it’s like so well-placed. The interesting thing is the only thing that could detect the watermark is the AI itself. The AI can place the watermark and detect the watermark. That was our thought process.
What’s interesting about the watermark is we are viewing it from a lens that’s very different at the moment. One of the core problems with generative AI is data sourcing. Reddit or Stack Overflow are websites that have tons of user-generated content and data. Their IP is the data they have collected, Twitter for example.
What you see visibly is a lockdown of APIs. Twitter famously locked down its API a long time ago. Reddit locked down their API which caused several applications to shut down. The premise here is they need to protect their data in some way. There are good actors in the world who are creating value from that data, but then there are people who are scraping this data and creating value for themselves. You can assume a lot of OpenAI eyes models and open source models collect data from Wikipedia and all these sources.
It’s impossible for Reddit at the moment to say that this text model was created using our data source. That’s difficult for them to say. What we found with audio watermarking is that, if a company takes a catalog of audio and uses our watermark, the watermark persists through training. For example, if someone created a model based on this conversation, and this conversation was watermarked. What would happen is that the model that was produced would have the watermark retained. We could trace it all the way back to its data source.
That’s interesting for us because it allows IP protection all of a sudden. That’s extremely important and the world is going in this direction already. You have OpenAI and you have tons of academic papers out there that are introducing similar concepts. Can they insert watermarks into images or text such that the watermarks could persist through training and they can identify these watermarks afterward? Very similar concept to ours. It’s definitely an exciting area but it’s one of those things where you put something out there, and then we are observing how people are reacting to it. It’s a huge advancement in terms of keeping everyone safe.
As a real-world example, let’s say I subscribe to that service right now and for every one of these podcasts, I employ it, so it’s all being watermarked. If someone creates a deepfake of me saying something I would have never said, it would be very easy for me to say, “Everything I have done is watermarked. You can look at all of them. That one is not.” Is that the case?
It is. More importantly, if your data is used by a model to create another person’s voice, but your data was included. It’s not even clear that your data was used to train that model. It’s not even evident at this point. What we are promising is that the output of that AI model that consumed your data to create another voice could be traced. We could trace it back to your show and say, “I never gave the rights to my data away for training purposes.”
It’s all mind-blowing and it’s happening so quickly. I used to say it about the blockchain industry. It used to be a single subject and you had experts in it. Slowly, over the last several years, it got to a point somewhere in there, maybe halfway through it where no one was an expert. You were an expert in a segment of it. It’s just too big and it’s too wide. AI is even more so. It’s expanding in every direction. Are there any use cases you are most excited about outside of what you guys are doing? Is there anything you see out there that’s mind-blowing for you that’s coming to be?
There’s so much. The entire area of creativity around text-to-video is very interesting. That’s going to be a game-changer for quite a few industries. The art of describing what a video should look like and that process being automated with AI sounds extremely fascinating. Even with text modeling, I can imagine we are getting so close to that sci-fi world now where the modality in which we are consuming information is about to change. It’s not going to be traditional. It’s hard to predict. These are all predictions.
I worked in this chatbot space early on and the promise there was you would just talk naturally to this assistant who would naturally talk back to you and you could have a conversation. That opens up a whole new way of experiencing information and therefore the internet. Imagine you go into like a store like a supermarket and you have to look for this thing or these items, which is not foreign because everyone has done it.
Now, we have to step up with now you have an app that allows you to search or browse on a mobile phone, which is nicely or neatly categorized. You are not physically walking through lanes anymore to find the right product. There are some recommendations placed everywhere so that you can go from one product to similar products but we are heading towards a space where that may not be the case either. It might be the world where you just conversationally describe, “I’m looking for this vague thing. I have this shirt and I want to find the right jacket.” It knows what you are talking about. You didn’t describe the color of the shirt or anything about it, but it is able to figure that out. It’s very interesting in terms of how we are consuming information.
What’s also interesting, there are a couple of startups I have seen working on this already, but it’s like the second brain concept as well. I found this to be interesting in perspective. It’s like Ironman. Ironman in a suit can talk to this assistant who can do any task possible. It’s like a dream. We are getting very close to that dream of having a second brain. Someone who looks at exactly what you are doing and learns from all of that data that your brain is learning from, and then you can converse with this thing that has a longer memory than you do. That’s very interesting. There’s a company called Rewind.ai, they are doing that exactly, which is called the second brain and it makes their life easier. It’s extremely powerful and extremely interesting to augment humans in general.
It’s all happening so quickly. Any last thoughts in regards to generative AI, deep learning, or speech automation before we leave this segment here? Is there anything else that came to mind that you’d like to cover or are we good?
We are good. We covered quite a bit.
I do want to come back to this. Resemble AI, you guys operate out of Canada, is that accurate?
I’m Canadian and my co-founder is Canadian. We started the company in Toronto but when I worked for the startups in my previous companies, they were all in the Bay Area. The Bay Area has some elements of magic and it’s being revived at the moment. We have an office down in San Francisco and one in Toronto.
It’s time for what we call, “AI wants to know.” AI is curious and so are we. These are ten quick questions designed to uncover the intriguing mysteries that AI longs to comprehend but can’t quite grasp. It’s a snack break in our journey. Keep the answers quick but the safety belt sign, that one’s off. Let’s explore more about who you are and what makes you tick. Are you ready for this?
Let’s go for it.
Number one, what’s the first thing you ever remember being proud of?
When I was in middle school, I designed a logo and I remember that was the first time anyone paid me $200 for a logo that I designed. That felt surreal.
What is it that you need help with that you wish you did not?
There are so many things. Memory is one of them. Memory is the core thing. There are so many things going on, especially in the role that I have right now. Every entrepreneur probably feels the same way, overwhelmed with the amount of data that’s thrown at us. Maybe people feel the same way as well. If I didn’t have to remember so much or if there was something to help me remember, I feel like there’s constantly this guilt of losing knowledge. It’s like this constant thing. If you don’t write for a little bit, you lose the ability to write to a certain degree and I wish that wasn’t the case.There is constantly this guilt of losing knowledge. Click To Tweet
Your next product could be an external hard drive for your brain, scraps around your chest, or something, you never know. That was a fascinating answer. I loved it. What do others often look to you for help with?
I wish I could be helpful in any way possible. Anything, honestly. I have some skills. I’m not the smartest person in the room. I am not the most creative person in the room either. Wherever I can lend a hand. Having a different perspective on things, that’s typically where people come to me especially within the company as well. What I try doing is making sure I can lend a hand in every aspect of what we are doing. Whether it’s marketing or sales. There’s a perspective that comes from each individual. I often try to help wherever I can.
Number four, what do you treasure most about your human abilities?
How high resolution they are. Have you ever taken a picture, and then you take this picture and you are like, “It looks so much better in real life?” The picture looks terrible. The picture does no justice to what I’m seeing right now. It’s humbling to understand how far we have come and how far we have to go in terms of seeing a waterfall. The waterfall in the picture looks very different than the waterfall it does in real life.
What a powerful answer and so accurate. The word mother nature comes in and I’m sure you could do a virtual reality with this stuff. It’s still not the same as standing in front of that waterfall, feeling the mist coming off of it, and every other sensory that fires on it. Number five, throughout your whole life, what is the most consistent thing about you?
The most consistent thing about me is that I can’t sit still. I have to keep doing something which is probably how I ended up where I am right now. As far as I can remember, that’s been the constant.
It’s funny because you almost answered the next question which is the opposite with the same answer. Number six, throughout your whole life, what has changed the most? I would say where you are sitting but go ahead.
Where I’m sitting. The mindset is still the same when I look back. It’s just where I’m sitting that’s changed the most. There’s a level of maturity I hope. The biggest thing is also, I have also learned this, it’s more reflective than anything but not being able to control outcomes. That’s a huge part of machine learning. When I got into machine learning, my mindset changed from, “I write this piece of code. It does exactly what I want it to do to this algorithm that I don’t know what it’s doing.” There’s a randomness to this outcome and you just have to accept it.
Number seven, what do you find strangest about reality?
It’s funny because I’m answering the next question with the previous one. The randomness factor, in reality, is extraordinary. It’s difficult to explain in words, especially the further you go into artificial intelligence, the more you realize how complicated we are as humans. We have biases. We are not rational. AI is fairly rational. It learns from data. It knows what it thinks is right and what it thinks is wrong. For humans, we are irrational. We make decisions, we say things, and we do things that are sometimes not rational at all. We do it meaningfully. We don’t do it without knowing. It’s a strange part about being human.The further you go into artificial intelligence, the more you realize how complicated we are as humans. Click To Tweet
We have all had those moments of life where we feel alive, be a vent or a moment, but something happens and we feel incredibly alive. What’s the most recent example of that for you?
That one’s a tough one to answer. I probably will skip that one.
Happy to move on. Your most unique trait.
I’m very good at the longer I have been doing everything. The thing I have realized the most is I’m very good at breaking down problems. That’s such an important skill set and trait to have. No matter what you are doing, the concept of breaking things down into fundamentals is extremely interesting. I can now appreciate and understand what people love to do. Woodworking for example and construction, all that stuff like physical labor work. It’s because there’s a certain satisfaction that is being able to break down a complicated structure building into like, “Here’s a block that I need to put somewhere to get started.” That’s fairly difficult to grasp.
I’m in the middle of a book right now. It’s called Shop Class as Soulcraft, and it covers exactly what you are talking about. As you all know, a lot of shop classes don’t have the funding. They have been shut down. Kids don’t get that in school, but there is something to be said about physically putting things together, making them work, and planning them out. That is a learning process your brain needs to know. You are preaching to the choir on that one. This is number ten and you are done with the quiz part of the show, but if you weren’t human, what would you be?
I’d take any bird, honestly. Have you ever seen a bird fly and you are just like, “How does that work? Where does the momentum come from? How is it just instant?” A plane doesn’t fly that way. A plane needs momentum to lift off but a bird doesn’t or it’s very minute.
Perspective, you didn’t use that word in this show, but you did in essence. Birds get different perspectives. They can get above things look down and widen out the picture. It’s a beautiful thing to watch. Congratulations, you made it through those ten questions. We are going to want to talk about AI leaders and influencers because you are so deep in it and you are doing some super cutting-edge and amazing things. There have to be some leaders and influencers that have motivated you or you have learned from and have sent in this direction. Do any come to mind?
There are a couple. The first one is a gentleman who is probably extremely different from I am. A guy named Satya Nadella. He’s the CEO of Microsoft. I know everyone speaks very highly of Satya. He’s an extremely good leader. Within this AI space, he has made some of the right calls possible. The way I look at it and the reason he’s an inspiration is because it’s extremely difficult to unlearn. It’s extremely difficult to be in the thick of things for decades and have a culture or from a business perspective, have your product. Have products that work in a certain way.
To take a step back and be like, “Let’s scrap everything. If we introduced AI, what could we do better?” I think that Satya is going down the same track. Everything from the acquisitions that Microsoft has made with GitHub all the way down to the adoption of AI within Microsoft and those suites of tools, he has turned Microsoft around. It’s a company that developers love all of a sudden. The community loves it all of a sudden. All of a sudden, meaning like ever since it took over, the perception was extremely different before and he’s made the right calls.
The second person is a guy named Pieter Levels. There are so many of these indie developers out there, they are so inspirational. They are solo or a couple of people sitting in a room. They are essentially doing what a startup would dream of doing many years ago. That just goes to show the resourcefulness that AI and general technology allow given that you have the right mentality and you have the right persistence. There are a lot of these tools out there now at your disposal more than ever. It’s built something quite large with very few people and very few resources. Two very different people but both of them have done extraordinary work.
I love what you said about Satya because Microsoft should have died many times. In any company in a tech world that’s changing like that, usually, your old technology becomes your anchor. You are dragging at some later date and others can move more nimbly and pass you, but they have been consistent. It is truly amazing. The other thing that people are hungry for that’s in the industry and getting into the industry is resources. Do you have any resource list or any top-level places you go to keep current on what’s going on?
I would say there are a couple of places that you can go to and a couple of resources that I look towards. The first is GitHub. GitHub is like a social network for tech people. That’s exactly what it is. It’s transformed from this place where people collaborate and put code, to this place where people share ideas. There are tons of things to be inspired by on GitHub and tons of interesting people to connect with on GitHub. It’s a little bit difficult because it doesn’t act like a social network so it’s a bit difficult to get to them. If you are persistent enough you will figure it out.
The second is a podcast that I listen to somewhat frequently, but it’s by a gentleman named Lex Fridman. It started as The AI podcast years ago. He has some of the most phenomenal people with him to talk for 2, 3, or 4-hour sessions on deep topics. There’s a lot to learn. They get very particular because the guest is so good at what they do that there’s no choice but for the guests to be deep into the topic that they are good at. That’s the fun about it. It’s interesting to hear about people who are general technologists and programmers who are just phenomenal.
Even within the AI space, it turns out that if you have someone who was doing great work with just programming and software before, their work will transition into AI. I think that the fundamentals are pretty much the same, reasoning, logic, programming, and math. The fundamentals are pretty much the same. I would say that the podcast and GitHub as a resource are probably where it would go,
Both are great. If you could leave our audience with a tip on using whether it’s ChatGPT or some of the more mainstream common AI tools that are now being used. Years ago, they weren’t and they are right now. From a professional like you or an expert like yourself, are there any tips that you could throw out like how would you get the most out of those by using them for the general population?
You have to play a lot. The first tip is to just play. These tools might be a little bit difficult to grasp but they are much easier than tools that preexist. It’s much easier to use ChatGPT and become a master at ChatGPT than it is to use Photoshop to a certain degree. I’d probably recommend doing that. The other thing is that once you understand how these tools work, you can fit them into your workflow.
The more you play with these tools, the more you realize the relationship between you and the tool is like you and an employee. It’s very much like you can instruct what the tool to do and it will go do it. To a certain point, it’s worthwhile to not learn how to code anymore and not learn how to program. It’s better to just learn how these tools work.
One thing I have seen is you get on it to begin with, you start using it, and you will ask a single question, you can’t believe the answer. It’s not natural to then go deeper, but you don’t get charged by the word. Twist it, I want it half as long and I want it twice as long, so add in this component. The more you do that, every step is learning for you and learning to use the tool better. It’s all brand new, we are all just exploring it. I would just say go for it and play. There may be people that you want to send a shout-out to. People that have helped you along the way or help you now that you want to show some appreciation to. Maybe we can take a moment and let you do that now.
I will keep it as a collective group of people for this. People who are doing open-source work in general, even before AI. As I said, I learned programming with like a monkey see, monkey do attitude. From the earliest days of learning how to code, it’s just looking at resources and open-source code stuff that’s been published. I try to understand how they did it and try to learn.
It’s like you learn how to write the same way. If you want to be a better writer, just start copying writers, you will start being a better writer. The collective group of folks that are doing, that are open-source contributors, and that are not only creating code but also creating tutorials, books, and overall material. There’s a lot of great work out there.
One of the things that are happening is I’m seeing work go from being academic and being published as papers to being published as open-source work, which tells you a lot about where the world is heading. It’s not about coming up with novel ideas and papers and pursuing a PhD. You are at the point now where you can take a bunch of things, glue them together, and have a novel idea.
In one particular case, there’s a repository or GitHub page called AutoGPT. AutoGPT is this concept of these large language models like ChatGPT talking to themselves. You have multiple instances given a problem, they talk to each other and instruct each other what to do next. That’s fascinating because it’s fairly novel. Generally, we are seeing this collective group of open-source contributors and they have been inspirational.
Resemble AI is amazing and I’m saying it because I truly am blown away with what you guys are doing. I have no doubt that it’s going to, as it grows, be more and more a part of the fabric. You guys are the only ones I know of anyway doing this with audio. It’s critically important as all of us are worried that our voices are going to get bootlegged, for lack of a better term. You guys are so far ahead, it’s fantastic. Resemble AI, where would you send our audience if they want to learn more or follow what you are doing and keep updated or even possibly reach out to you?
They can visit our website, Resemble.ai. We are on Twitter, @ResembleAI. It’s where we post all of our updates. If they want to reach out to the team, it’s pretty simple, Team@Resemble.AI. If it’s me, it’s Zohaib@Resemble.AI. I’m fairly open. If there’s anyone that wants to talk more and connect, I’m happy to do so.
Zohaib, you have been amazing. It is time for another safe landing at the outer edges of the AI universe. This is your Captain, Ron Levy. On behalf of our guests and the entire crew, I’d like to thank you for choosing to voyage with us. We wish you a safe and enjoyable continuation of your journey. When you come back aboard, make sure to bring a friend. Our starship is always ready for more adventures.
Head over to Spotify or iTunes right now, rate us, and share your thoughts, your support, and feedback. It means the world to us. Don’t forget to visit EdgeofAI.co where you can learn more. Connect with us on all the major social platforms by searching @Edgeof_AI. Lots going on there online. Before we sign off, mark your calendars for our next voyage where we will continue to unravel the mysteries and advancements of AI. Until then, bye. Zohaib, I can’t thank you enough.