Bryan Barletta: Hey, Bryan Barletta of Sounds Profitable here. After coming back from two conferences in Europe, announcing our first ever event for our sponsors on the Tuesday of Podcast Movement in Dallas this year, and prepping for our first two public research reports with my new partner, Tom Webster, I realized that I fully exhausted our backlog of recorded podcast episodes at the worst possible time. But with the influx of new subscribers to the newsletter and podcast, I realized now was the perfect time to highlight four amazing episodes from this season of Sounds Profitable: Adtech Applied that I'm positive you'll love.
Oh, and if you'd like to learn more about our free and livestream research presentations in June and August or attending our sponsored event, check out the episode description below. When we started using Veritone Voice to expand Sounds Profitable into Spanish, the whole goal was to show people how accessible it was even with a small team, which Sounds Profitable is. The success of that attempt led to The Download, our Friday podcast, which you absolutely should check out, to be localized by our hosts, Manuel and Gabe, into Spanish in their own words and their own voice. This is a must listen to episode with Sean King of Veritone Voice, where we talk about why expanding into Spanish is critical for your success in podcasting.
Sean, thank you so much for joining me today.
Sean King: It's a pleasure to be here, Bryan.
Bryan Barletta: So it's really cool because we've been talking about synthetic voice for a long time, you and I, and I've always enjoyed collaborating with both Veritone and Veritone One, the tech side and the agency side. And this product that we ended up working on, this process that we're working on here, is all about taking my written newsletter, translating it to Spanish, or it can be any language, and then you created a synthetic version of my voice to speak the Spanish version. So we could have a Spanish language podcast, which sounds like I'm speaking in Spanish. And how long have you guys been working on that?
Sean King: Well, we've been experimenting and working on that for quite some time. We brought it to market officially in May of this past year, but just something within our expertise on the Veritone side, within media management, content management, workflows, and everything that go along with it. And then on the Veritone One side, their expertise in all things audio and voice, we just felt that we are in a unique position with MARVEL.ai to be able to help not just create synthetic voice, but really help companies and individuals being able to help in the deployment of it, the usage of it, and even the monetization of it.
Bryan Barletta: Yeah. That's the interesting part there. I think there's so many technologies out there about synthetic voice that are just like, "Cool, you can build this thing." And then it's like, "Well, now what?" Right? And the momentum specifically in podcasts advertising and podcasting in general is just keep moving forward. You can throw bodies at a problem, operations work when they're easy, but not a lot of people are able to take the step back and create a new process or on board people to a new process, they can accomplish it with manpower. So taking the time to build this out, that this is actually something that can be a whole workflow for people is really fantastic. It makes it accessible even for a team with me and Ian doing this for Sounds Profitable.
Sean King: No, absolutely. And again, the part about it and again our understanding on how the entire ecosystem functions and works and all the people that are involved in it, it is very much driven by time of individuals and being able to make sure that you can help at least from a voice standpoint of making someone extensible. I mean, for example, you may get sick tomorrow, Bryan, and you may need to record something. And that voice being sick is going to be very different than how it is today when we're talking. And just being able to make sure that you can keep moving forward and keep going on.
And again, having that ability to also make it more extensible where you can personalize it a bit, localize it a bit is just providing more opportunities to get the message that you're trying to get across and into more people. And even in the ability to be able to hear it in a language that they're accustomed to.
Bryan Barletta: Yeah. With the synthetic voice, it's so neat because I've tried out a few of these other tools and we're seeing some of them in advance, but the initial version of a lot of them where you had to read a script and the hard part is that the longer you read a script that you hate, the more you mess up, the more you sound mad and the harder it is to get a good control. And so you just asked for audio files, they weren't even isolated. You dug out my voice, you created something that sounded really good. My wife is honestly the only person who tells me that it doesn't sound like me, everybody else is like, "Yeah, that's great." So that's awesome, close enough. I'll take it. I'm glad it doesn't fool my wife. But what was neat is that I had to consent every step of the way.
And I reflected on that when I started having to sign that and provide my voice authorization for that is that your website doesn't have the Obama example. There's some companies out there where they just grab famous people's voices that super haven't consented to it and just show a slider to let you listen to the original speaker and then layer somebody else over to it. So one of the big things we talked about before we even started on this was consent. And you had really teed it up in a really smart way. There's a big difference between synthetic voice, which is what we're doing, and a deep fake. And I'd love to hear you break that down for everybody.
Sean King: Sure. The fundamentally comes down to consent. At the end of the day, and there was a big expose on CBS where they talked about synthetic media and deep fakes, and one of the reporters said the technology is neutral. It's how you use it. And I couldn't agree more. For us, just because we can create synthetic voices of anyone doesn't mean we necessarily should. And for us, best way to say it, AI here is this is where it gets very close to home and very close to an individual. And for some individuals, that could be very scary that, "Oh my goodness, this is a computer or this is someone else and it's speaking, but that's my voice."
Well that's very scary when you're not aware that's happening, but if you've given consent for a clone of your voice to be created, and now that that's made, you've given the authorization of that clone to be used by specific individuals for specific use cases, it becomes infinitely less scary.
You're aware that it's there, you know where that technology resides, you know who has access to it, and you know how they're using it. And for us, I mean, it all fundamentally starts with consent. I mean, we can create a voice for anyone. We're not going to do that unless we have written consent that says that they are giving us the okay to it. They've given us the appropriate training data and showing that they have the rights to be able to provide us said training data. And thirdly, that vocal recording of them saying, "Yes, this is me and I allow my clone voice to be sent," because we want to make sure that matches the training data that we have. And we want to make sure that we have those three checkpoints because we can recognize that we want this to be done for good and the best way to start it is protecting those individuals.
Bryan Barletta: I think it's so smart. And when you were talking about it, you said if my voice went away, what an interesting way to look at it, right? Whether it's text to speech or speech mapping, where I speak in another voice is layered over me. If I can still speak, but my voice sounds different. If there was an AI version of my voice, if there was a synthetic version of my voice, you can layer it over me so I can sound like I do now, even if there's damage to it.
That could be amazing for recording artists. That could be amazing for voice talent. You're securing and protecting that part of your business, that part of your identity. And then the other aspect of it is people are always worried about loss of jobs. And I tend to think about if there's an individual that's so valuable and we want to do more with their voice than they can do, I think we open it up to more jobs because you need a talented person who can mimic their speech patterns.
We had Evo talking over my voice or his voice map to mine. And it sounds like Evo trying to imitate me. It would never fool anybody that it is Evo if you listen to a few sentences because I have a little bit of a Boston accent and Evo doesn't, but if I get really excited and I can mimic aspects of Evo, but you need someone who's talented enough to do that. If they have accents, you need someone who can mimic an accent successfully.
And so that is creating a new layer of work for someone who maybe doesn't have the right sound to their voice, but has all the talent to drive that voice, to play that instrument and that, ah, man, that's so cool. Being able to have a small team that could license someone's voice at a cheaper rate to do something, to do a demo or a mock and have sophisticated talent to execute it. It sounds awesome. That sounds way better. It sounds like it's creating opportunity.
Sean King: Well, it does. And also from that side, think about it from like voice fatigue. If you're a voiceover talent and you have to get into the studio and you have to do 500 different reads. The quality at read five versus read 455, it's going to be very different. So being able to make sure that you can have and control a consistency of your voice. At the end of the day, that brand or that company, or that movie studio or that producer, they're still wanting your voice and they're still wanting you and your talent and your inflection and your tonality, but having to not just be able to have that, but that safeguard that you're getting the best product forward that you can. I mean, you're providing another layer or another tool in your instrument that you can take out there and have available to the masses.
Bryan Barletta: Yeah. I agree with that. And so on the ethics line, is there an organization out there that like... So we have the IAB for podcast measurement and all that. Is there an organization out there that focuses on the ethics of this type of technology?
Sean King: Yes. I mean, we have partnered and founding members of the OVN, it's the Open Voice Network. So it's a group of a lot of industry executives that's from the content community, that's from the talent communities, from the technology community that are really seared around education and the right controls and governance that need to be in place to make sure that they're safe and ethical usage of it.
Bryan Barletta: That's awesome. And is it easy for someone to apply? Is that something I could participate in?
Sean King: Absolutely. Absolutely. And happy to make sure we can share that contact with your audience.
Bryan Barletta: I think that's super cool because I think that this is an opportunity for the companies in this space that are trying to go about it the right way that do not want to be associated with deep fake technology, that do want to drive this technology and this usage forward, this is a great place for everybody to be in there and there's so many different ways to go about it. There's not going to be one size fits all. There's always reasons for competition and new partners in there and new innovation, but if we can all agree on a framework for that ethics, I think it gets a little bit safer.
At the end of the day, the tool is a tool, but we are deciding as humans as ethical people how we're going to use it and I like that. And the more visibility that that has, the more I think this will be comfortable for people. With Sounds Profitable, I've decided for my audience I want to declare it at the top. This was translated by Veritone MARVEL AI. This was this is a synthetic voice provided by them. I make that very, very clear in everything I do at the beginning, because I want people to be able to opt out if they want to. And I think that has actually gotten more people comfortable with it. Right?
Sean King: Agreed. You don't want to trick the consumer at the end of the day.
Bryan Barletta: Which is so weird because Jon Favreau, which as me and you are both sours guys, he was making a comment about the Luke Skywalker thing. And he was just like, "Ah, it's movie magic. You don't need to know." And I was like, "I don't know. I think I want to know. I don't think it's going to ruin it for me. If I know, I get you're saying there's the reveal there, but we got to find a way that I can know because I want to have that option." I want to be able to say like, "Oh, this is really cool technology." Not like, "Oh my God, I've been tricked."
Sean King: Absolutely. And look, the Star Wars is a great example. I mean, that was created by our friends at Respeecher. And I happen to know they asked for a lot of consent as well in the work that they do. And that's a great example we were talking about having Mark Hamill's voice in '85, very different than Mark Hamill's voice today in 2020. But being able to again, have him still voice his iconic character and the right tonality and the right cadence that he is. But being able to hear his voice from '85, it helps make a more impactful moment.
Bryan Barletta: Yeah. I guess maybe playing the Joker in all the DC animated stuff is probably push his normal voice closer to Joker than Luke Skywalker as he gets older. Not mad about that, but it's such a cool technology and that's such a great example of it, but this is the interesting part. The ethics were held forward by the people involved in it. He gave his consent to it. He walked through all the steps that was used in the appropriate manner, but now we have to discuss the next part, what visibility is needed. And that's why I'm excited that you're a part of this and you're a founding member of Open Voice Network. That's a really cool thing to invite people to, because this is the cutting edge of this. We're just starting to define it and this is a great time for people to get involved in it.
Sean King: Yeah. And I think, again, I a hundred percent agree because look, we're interacting with voice today, all around. We're talking with Siri, we're talking with our IOT devices. It's in our cars and coming to our cars. It is something that's going to be... It's snowballing at a much faster rate. But again, here's a great opportunity where you can really start to personalize these interactions in our life and add new personalities to what we're working with. But again, to that same point, just because you can, doesn't mean you should. So let's just make sure it's done ethically with right consent and with appropriate disclosures so that everyone can feel a lot more comfortable about what we're doing.
Bryan Barletta: Yeah. And with the language thing, I thought it was so cool because what excites me about that is that it's translation. You are translating my American English words into a specific Spanish dialect and it hits home. It becomes more accessible for an audience that could not consume it. It explains that this is not me writing it out there and it's specifically translation. It's not localization, but what it does is it helps me test the water on a few things, right?
It's not someone taking my concept and when I say lipstick on a pig and breaking it down to something that resonates better to people in Mexico. It doesn't do that part because that's not what it's supposed to do. It's supposed to make it accessible, my words accessible, but it helps me test it out.
It shows is the interest there for it to be in another language? It shows are there people there that are hungry to help me expand that find interest in it and want to help with the localization aspect of it. And what's more exciting and I think that you've made it pretty clear you're open to collaborating with anybody who wants to dig into this and do more with it, I truly hope that somebody listens to or reads the Spanish version of Sounds Profitable and says, "I could do this better in a Spanish speaking language, talking to people that Bryan can never communicate easily with because of language barrier and I want that content in English. I want to compete with Bryan on English stage."
That would make me so happy because we talk a lot about English going to other languages, but Squid Game is one of the top shows right now. And half the shows I've watched on Netflix over the last two years that have been astounding have been not primarily English. And that's exciting to me because we need those different cultural points of view. We need those different experiences. And we need to think about not just as English out, but other languages in.
Sean King: A hundred percent agree. I mean, just looking at some podcast stats. I mean, you look at consumption stats and using the first of English outward, look how much consumption of English podcasts are globally. No matter where you're at, you're getting the same English language that everyone is hearing as you fear having it here. But being able to somewhat personalize that content by saying, "Look, I'm downloading this in Italy. I could hear Bryan in Italian." How much more impactful is that going to be? How much bigger is the audience going to be? How much more expansive is my messaging going to be?
And then to that same point, there's a lot of global content out there. And if anything during the pandemics and the studio shutdown is people are hungry for content. Yeah. And anything that we can do to help localize personalize in that content so it can be consumed to the individual is going to help the person become more informed on a topic they're looking for. It's going to help that show creator or that educator that's trying to get their message out there. It's going to expand to the masses and that's really exciting.
Bryan Barletta: I could not agree more. I think that is the most exciting part of it. We need different points of view, right? We need to appeal to a wider audience and that needs to be more accessible. And you guys are providing this tool to me as a sponsor, but I'm aware of what the pricing would've been for my specific use case. And it was absolutely within Sounds Profitable's budget. It would've been a worthwhile expense. And that's a really cool thing that I'm excited to be able to say because as the technology becomes more and more used, as it becomes better and over time, the price is only going to go down. It's not going to become more expensive. The price to licensing cool voices like Mark Hamill reading Sounds Profitable is probably going to be pretty expensive. Maybe that's 10 year anniversary present.
We'll figure that out. This is becoming accessible. And I think that we're going to be able to learn from people and languages that we've missed out on that have such great stories and education to share with us. And I'm really, really excited to see you guys be such a big part of that. And I'm really happy to be a demo and experience to that. And what would be your dream language that we expand Sounds Profitable into it? We did Spanish and I think that was a hit, but what one has excited you?
Sean King: Oh geez. Oh God. There's so many. I mean, look, anytime we get to create a new one and we get to hear that person speak that and you get that visceral reaction of someone going like, "Oh my." And then you get that validation from someone. Japanese, thrown that out there. Bryan speaking Japanese and having a Japanese speaker go, "Yep. Nope. That's word for word. That is great." And just getting to see everyone's visceral reaction to that, that's exciting. I mean, it invigorates me. It's like, "Oh wow. We're onto something amazing." We get to see this application of this technology and see how it's impactful in everyday people. I mean, I would love nothing more than that we have Sounds Profitable and any other podcast and other media, doesn't matter, in such language that we're having to go in different dialects within sub regions.
Bryan Barletta: That's going to be really fun.
Sean King: So I think that would be unique because look, think about education for a second. And there's a lot of education that's created online and the subtle nuances of someone speaking with a Boston accents, someone in California, someone's with a Southern accent or Southern drawl. And we're just talking about different variations of English language. But if you're trying to be educated or you're a student that is wanting to learn more, maybe you can actually absorb more if you're hearing just in a different accent. And so if we can get to the point where we're going to that level of personalization in it, that's going to be awesome.
Bryan Barletta: I'm excited for that. I think that's going to be a really cool thing to explore. Well, thank you so much for joining me. I always end by asking what is the podcast that you're listening to right now?
Sean King: Oh, I am a huge fan of Guy Raz and How I Built This. So that is always the one that I'm listening to, it's a motivating one for myself.
Bryan Barletta: I listen to him way more on Wow in the World because I have a three year old, but he's fantastic and yeah, great show. Well, thanks, Sean, so much for joining me.
Sean King: Take care. Thanks for having me.
Bryan Barletta: Thank you for listening to this conversation. For the full original episode, which includes my conversation with my co-host, Arielle Nissenblatt, please check out the whole episode in the episode details. Sounds Profitable: Adtech Applied will be on break until mid July, but we're really excited to bring you a whole slew of new content and guest. In the meantime, check out The Download, our Thursday podcast that covers everything you need to know about the business of podcasting and why it should matter to you in 10 minutes or less. And if you haven't already, please subscribe to the Sounds Profitable newsletter at soundsprofitable.com.