Spotify has bigger plans for the technology behind its new AI DJ feature following positive consumer response to the new feature. Launching just before the company’s Stream On event in LA last week, the AI DJ curates a personalized selection of music combined with spoken commentary delivered in a realistic-sounding AI-generated voice. But under the hood, the feature leverages the latest AI technologies and large language models, as well as generative voice — all of which are on top of Spotify’s existing investments in personalization and machine learning.
These new tools don’t necessarily have to be limited to a single function, Spotify believes, which is why it’s now experimenting with other uses of the technology.
While the highlight of Spotify’s Stream On event was its mobile app refresh, which now focuses on TikTok-esque discovery feeds for music, podcasts and audiobooks, the AI DJ is now a prominent part of the streaming service’s new experience. Introduced to Spotify’s Premium subscribers in the US and Canada at the end of February, the DJ is designed to get to know users well enough to play what you want to hear at the touch of a button.
With the app’s revamp, the DJ appears at the top of the screen below the music subfeed for subscribers, serving as both a convenient way to stream favorite music and a means of pushing free users to upgrade.
To create the commentary that accompanies the music the DJ streams, Spotify says it has drawn on the knowledge and insight of its own in-house music experts. Using OpenAI’s Generative AI technology, the DJ can then scale his commentary to the app’s end users. And unlike ChatGPT, which tries to create answers by distilling information found on the wider web, Spotify’s narrower database of musical knowledge ensures that the DJ’s commentary is ultimately both relevant and accurate.
The actual music selections the DJ chooses come from his existing knowledge of a user’s tastes and interests, reflecting what would previously have been programmed into personalized playlists, such as Discover Weekly and others.
The AI DJ’s voice, meanwhile, was created using technology Spotify acquired from Sonatic last year and is based on that of Spotify’s Head of Cultural Partnerships Xavier “X” Jernigan, host of Spotify’s now-defunct morning show podcast, “The GetUp”. Surprisingly, the voice sounds incredibly realistic and not robotic at all. (At Spotify’s live event, Jernigan spoke alongside his AI doppelgänger and the differences were hard to spot. “I can listen to my voice all day,” he joked).
“The reason why it sounds so good – that’s actually the purpose of the Sonatic technology, the team we acquired. It’s about the emotion in the voice,” explained Ziad Sultan, head of personalization at Spotify, speaking to businessupdates.org after Stream On wrapped. “When you hear the AI DJ, you hear where the breathing space is. You hear the different intonations. You can hear excitement for certain types of genres,” he says.
A natural-sounding AI voice isn’t new, of course – Google stunned the world years ago with its own human-sounding AI creation. But its implementation within Duplex led to criticism, as the AI called companies on behalf of the end user, initially without revealing that it wasn’t a real person. There shouldn’t be such a similar concern with Spotify’s feature, considering it’s even mentioned an ‘AI DJ’.
To make Spotify’s AI voice sound natural, Jernigan went into the studio to produce high-quality voice recordings while working with experts in voice technology. There he was instructed to read several lines with different emotions, which are then fed into the AI model. Spotify wouldn’t say how long this process will take, or detail the details, pointing out that the technology is evolving and referring to it as its “secret sauce.”
“From that high-quality input that has many different permutations, [Jernigan] then don’t have to say anything more – now it’s purely AI generated,” Sultan says of the generated voice. Still, Jernigan sometimes pops into Spotify’s writers’ room to give feedback on how he read a line to make sure he continued has input.
But while the AI DJ is built using a combination of Sonatic and OpenAI technology, Spotify is also investing in internal research to better understand the latest AI and major language models.
“We have a research team working on the latest language models,” Sultan tells businessupdates.org. There are even a few hundred working on personalization and machine learning. In the case of the AI DJ, the team is using the OpenAI model, Sultan notes. “But overall we have a large research team that understands all the possibilities of large language models, generative voices and personalization. This is going fast,” he says. “We want to be known for our AI expertise.”
However, Spotify may or may not use its own in-house AI technology to enable future developments. It may decide that it makes more sense to work with a partner, as it does now with OpenAI. But it’s too early to say.
“We are constantly publishing newspapers,” says Sultan. “We will invest in the latest technologies – as you can imagine, LLMs in this industry are such technology. So we are going to develop the expertise.”
This foundational technology will allow Spotify to move on to other areas of AI, LLMs, and generative AI technology. As to what those areas might be in terms of consumer products, the company isn’t ready to say just yet. (However, we’ve heard that a ChatGPT-like chatbot is one of the options being experimented with. But nothing has been arranged about a launch yet, as it’s one of many other experiments).
“We haven’t announced the exact plans when we might expand into new markets, new languages, etc. But it’s a technology that is a platform. We can do it and we hope to share more as it evolves,” says Sultan.
Early consumer feedback for AI is promising, Spotify says
The company hadn’t wanted to develop a full line of AI products because it wasn’t sure what consumer reaction to the DJ would be. Would people want an AI DJ? Would they engage in the position? None of that was clear. After all, Spotify’s voice assistant (“Hey Spotify”) was defunct due to lack of adoption.
But there were early signs that the DJ feature could do well. Spotify had internally tested the product among employees before launch and the usage and re-engagement metrics were “very, very good”.
The public acceptance so far matches what Spotify saw internally, Sultan tells us. That means there is potential to launch future products with the same underlying fundamentals.
“People spend hours a day with this product… it helps them make choices, it helps them discover, it tells them the next music they should listen to, and it explains to them why… so the response – if you will see different social media it is very positive, it is emotional,” says Sultan.
In addition, Spotify shared that, on the days users tuned in, they spent 25% of their time listening with the DJ, and that more than half of first-time listeners returned the next day to use the feature. However, these stats are early as the feature hasn’t been rolled out 100% to the US and Canada yet. But they are promising, the company believes.
“I think it’s a great step in building a relationship between really valuable products and users,” says Sultan. But he warns that the challenge for us will be to “find the right application and then build it correctly.”
“In this case, we said this was an AI DJ for music. We made the writer’s room for it. We put it in the hands of users to do exactly the job it was intended to do. It works super well. But it’s definitely fun to dream about what else we could do and how fast we could do it,” he adds.