Earplay, deltaDNA and the Rising Opportunity of 'Voice-First' Interactive Audio

Audio-first and audio-only games have to-date offered a fascinating micro niche within the gaming medium.

In the late 1990s the visionary musician and game designer Kenji Eno released a game with very few visuals at all for the Sega Saturn and Dreamcast called Real Sound: Kaze no Regret. The idea was that blind and sighted players could enjoy the same experience. A few years later Nintendo debuted SoundVoyager for the GameBoy Advance, which challenged you not to look at the screen at all while playing; and if you did, you'd only get a glimpse of some faded, minimal visuals.

Then in 2013 the mobile game Papa Sangre explored the notion of building 3D environments purely from sound, using binaural audio processing. In essence, playing Papa Sangre you could hear a 3D soundscape around you, but not see it. As such, you had to rely on audio cues like a creaking sign or the wind to guide you through a given space. The game by Somethin' Else did feature some very limited onscreen UI; thus the reason 'audio-first' is used as a term  to describe such games, over the more absolute 'audio-only'.

The 1997 Saturn release of the pioneering audio-first game Real Sound: Kaze no Regret.

Pushing the concept even further, the 2009 Global Game Jam creation 4 Minutes 33 Seconds Of Uniqueness featured almost no visuals, no audio at all, and no interaction. And yet it could still be played; or just about. Based on the divisive 1952 John Cage experimental 'music' composition '4’33″ of Silence' - which featured no sound at all - the game asked the player to sit and simply watch, waiting for the screen to turn from black to white. If any other player started a game anywhere else in the world, your game immediately ended. The aim? Watch the screen shift in tone for the full four minutes and 33 seconds. All you needed to do to win was be the only human on Earth 'playing' it. You can still download the free game here.

To point to 4 Minutes 33 Seconds Of Uniqueness as relevant is something of a digression from interactive audio, but the fact is that 'video' games are too broad as a medium to constrained by definitions that state that they have to be seen or heard. And since the boom in voice-controlled smart speakers like Alexa, Google Assistant and Siri, there have been ample opportunities to create and play audio-only games, from simple classic word games to fairly elaborate narrative-led examples that take the form of sonic 'text adventures'.

The major shift since the days of Real Sound: Kaze no Regret, of course, is that now the human voice can be a controller. 'Audio-first' is shifting to 'voice-first'

"The human voice is a powerful instrument for expression; you can do a lot more with it than you can with a mouse or a controller, and we are all already experts at using it," enthuses Dave Grossman (right), chief creative officer at audio-first gaming platform Earplay, and a veteran of the LucasArts adventure games team. "And audio occupies a big place in our lives; we're constantly listening to music and podcasts and audiobooks on one device or another while we cook or exercise or ride the bus. The fact that those devices can now listen to us in return offers an amazing range of possibilities, for games, stories, news, and who knows what else. Where we go from here is limited only by the imagination of the designers creating the experiences."

Certainly, for game designers and players, the opportunity for distinct and engrossing content is clear. But could it be a commercial opportunity of any size, and where does data and analytics come into it?

Before that, though, its worth understanding a little of what Earplay is.

A sound platform

"Earplay is like a Unity or Unreal for voice user experiences," offers the company's CEO Jonathan Myers (pictured, top), who recently worked to see an integration of the Unity-owned data and analytics powerhouse deltaDNA's technology into Earplay. "The cloud-based technology can facilitate team collaboration in a full end-to-end pipeline for the prototyping, designing, producing, publishing, and then live updating of interactive audio voice user experiences. The software includes a user experience engine with a core feature set that is highly applicable to building and running any voice user experience as a web service. Teams who use Earplay can focus on developing the mechanics, user interface, and content to be delivered to their audience, rather than coding all middleware, tools, and external service integrations themselves."

On top of that, Earplay has worked on various voice-controlled projects, such as the Jurassic Park Revealed audio adventure for Alexa. And it all started with Myer's notion that there was potential in building something akin to 'interactive radio dramas'.

Now deltaDNA has plugged its popular offering into Earplay's platform, the idea being that creators of voice-first games can harness the same abilities common to the creation and maintenance of so many titles in mobile and beyond. That means optimising the player experience based on data and insight, while endeavouring to lift engagement and drive in-content spending. Furthermore, deltaDNA will be rolling out more tools and services to Earplay in the future.

Earplay's Jurassic World Revealed Alexa app flyer

And the potential really is in place. Recent NPR and Edison Research study found that over in the USA one-in-four people now own a smart speaker, with US ownership numbers surging from 119 million to 157 million in roughly the past year. There is, however, something of a simultaneous opportunity and challenge for those trying to establish interactive audio gaming as everyday and truly mass market. Convention and standards are yet to be set in the voice-first ecosystem, pipeline and infrastructure of creating, distributing and maintaining voice-first games.

"Companies working today in voice and interactive audio are attempting to define benchmarks, set goals, and commercialise these new experiences," explains Myers. "As mobile app marketplaces matured over the past decade, several new companies seized upon that opportunity for massive growth. In most of those cases, data collection and optimisation of the games as live services were prevailing factors of success. We anticipate a similar trajectory with voice and interactive audio. Those who can deploy customised analytics solutions and then immediately iterate in response to incoming data will better optimise and improve quality, engagement, and demonetisation. The Earplay and deltaDNA integration enables game developers to do that faster and more efficiently."

Certainly, a lesson here is that interactive audio and voice-first gaming - while distinct in plenty of ways - has much to learn from the realm of free-to-play and mobile games as a service.

"Games used to be made using gut instinct but that doesn't work in 2020," states Mark Robinson (left), GM of deltaDNA at Unity Technologies. "The developers that achieve success are those that have learned to listen to the data. This is new ground so experimenting and optimising quickly will make sure the development process is de-risked even in these new environments. We are embarking on a voyage of discovery which is exciting and fascinating.

"There are many parallels with where the free-to-play market was only three-to-four years ago. We know that live-ops is as important as game development and the sophistication of player management has increased substantially. The interactive audio space, in collaboration with products like deltaDNA and Earplay, will start to uncover how to make compelling experiences and ensure content is created that is engaging for players. Data analysis will also allow developers to establish what healthy player behaviours look like, and facilitate strategic engagement to encourage such play."


Robinson isn't alone in thinking there are clear lessons to be learned from the realm of mobile either.

"There is plenty of carryover from the usual study of core metrics," confirms Meyers. "It's critical to know new and returning user stats, DAU, MAU, et cetera, as well as daily, weekly, and monthly retention and churn. However, as a new interactive medium with a new type of user interface, best practices for interactive audio games are not yet fully established. Engagement metrics like voice interactions per session may be a key performance indicator for some games, while in others it may be session length, user path completions, or variety of speech utterances used.

"It's more elusive, as the user experience under examination is more intangible, driven by a frictionless interface of speech. Although the methods of studying the data are similar, like using step funnels or A/B tests, applying that to interactive audio requires unique approaches that newcomers will discover and standardise."

Ultimately, we are yet to see how thriving the voice-first and interactive audio market will become. Certainly, though, highly accessible games that can be played - like a podcast - while the user is doing something else have tremendous potential. That's especially true now the smart speaker has become eponymous with everyday life in many parts of the world.

Perhaps all that's needed, then, is development of the ecosystem, technology and knowledge base that powers these innovative gaming forms. Earplay and deltaDNA, it seems, are already hard at work trying to make that happen.