This post was contributed by Henney Oh, the CEO and Co-founder of G'Audio Lab.
Audio Engineering Society (AES) Conventions are the world’s premier professional audio education and networking events. I’ve been attending AES Conventions since 2000, and the 142nd AES Berlin International Convention did not disappoint. The format hasn’t changed too drastically over the years — there are still large-scale special events, workshops, detailed paper presentations, and tutorials to educate and entertain attendees. These provide a variety of opportunities to interact with and learn from top experts in the industry. The noticeable changes year in and year out revolve around content. This year, the convention promised to dive into “immersive audio formats and the latest developments in production and content delivery to the consumer.” They weren’t entirely wrong about this, but they should have said that VR audio, in particular, was going to be a hot topic of discussion. It dominated conversations and workshops over the course of the entire weekend.
One of the workshops that explored this topic in greater detail at this year’s convention was “Current Workflows in Audio for VR” and I was thrilled to participate in it as a panelist. The workshop was billed as a discussion about different approaches in this “immature field,” reflecting an idea that is shared by many — VR is so young, nobody can possibly have it figured out yet. In my individual talk before the panel discussion, I shared the circumstances behind G’Audio’s rich audio background as well as a detailed explanation of the tools that are already effective in VR audio workflows today. I also was able to shed some light on emerging trends and a few issues that will need to be tackled in the near future. My slides from that presentation can be found HERE.
Deep roots in the audio industry
Everyone loves a good origin story, but not that many people have had the chance to learn about G’Audio’s. I led a team that contributed the binaural rendering technology to the MPEG-H 3D Audio standard back in 2014. During March and April of that year, at the 108th meeting of MPEG in Valencia, Spain, the binaural rendering technology was accepted to Committed Draft (CD). This marked a key step in the overall adoption process and influenced our team in a profound way. While visiting the region, we were introduced to the groundbreaking and innovative architecture of the prodigious Spanish architect Antoni Gaudi. The bold spirit and ambitious actions of the architect and our team were mirror images of each other, and so Project Gaudi was born.
Moving forward with the same bold vision of its namesake, Project Gaudi sought to expand the binaural renderer from the MPEG-H standard to include some powerful new features — namely, interactivity. Even at this young stage in the VR industry (Oculus had been acquired by Facebook only a month earlier), I believed that this new technology could be successfully implemented into VR applications. The team got to work right away, building the renderer, then production tools, then a unique delivery format, and then the entire workflow that G’Audio Lab provides today to drive the VR audio industry forward.
Fixing the fractured audio formats
Over the course of the convention and during the “Current Workflows in Audio for VR” workshop, the conversation frequently turned back to questions of audio format. Since the dawn of the VR industry, there hasn’t been a single, uniform audio format that has been used across the board in VR experiences. Mono and stereo signals have been utilized in some cases, and Ambisonics has been rising greatly in popularity over the last year, but each signal falls short in some regard. The rest of the audio experts and I at G’Audio Lab have spent much of our time researching the best practices for this new medium. What we’ve found is that a sound scene is best represented with a combination of object, channel, and Ambisonics signals since each signal has a unique advantage over the others.
Since lifelike sound is replicated when object tracks provide pinpointed localization and Ambisonics add meaningful depth and ambiance to a VR environment, we had to create an entirely new format, GAO. This format supports any combination of object, channel, and Ambisonics audio signals to deliver 360 video with the best possible sound quality and accurate localization. Projects made in professional Digital Audio Workstations like Pro Tools can be exported to GAO and played with our renderer SDK. That SDK can be embedded in any 360 video player. When these are paired together, the sound quality on the consumption side is guaranteed to match the quality from the creation process. Creators can finally resolve their quality control frustrations that stem from dealing with too many different platforms and hardware in the market and the varied specs that come with them.
VR audio crystal ball
When you are as involved in the VR audio industry as I am, being able to discern major trends in the industry is inevitable. I have always believed that collaboration is key to the success of our new (and growing) industry, so I was happy to share everything I’ve learned this year. We also dove into some riveting discussion about how we should handle these changes together as a community.
The first trend I shared was that we are seeing a rapid shift to 6DOF content. More content is becoming user-centric, where the user has greater agency and can interact with the storyline. Linear content still exists, but we are seeing more cinematic experiences being created with 3D game engines or at least utilizing 3DOF in creative ways. People at the highest levels of the industry like Tony Parisi, the Global Head of VR and AR at Unity, are recognizing this shift. When he introduces himself, he is sure to let everyone know that he is helping people adopt Unity beyond gaming. With all these changes in mind, audio engineers who used to work on films and in the commercial domain are eager to learn 3D game engines. That eagerness is being tempered, however, as they anxiously wait for a solution that bridges the gap between DAWs and those engines.
The second major trend I discussed was the growing demand for live 360 video. People are already familiar with live streaming on SNS, so extending that to 360 video for consumer social media is a natural next step. However, journalism, sports, and music concerts have all been actively experimenting with live streaming in VR as well. During this process of experimentation, audio experts are learning that there is a different set of expectations for each genre. This results in different audio requirements and a unique workflow for each application.
I also talked at length about the emergence of VR music. In live 360 music experiences like concerts, creators will have to consider the proper placement of audio elements to reflect the most realistic possible listening environment. However, music in VR could be more than just recording a live concert with a 360 camera. The listening paradigm in VR has changed from loudspeakers to headphones and sound design is shifting from stereo-based formats to object-based ones along with it. Now that both of these are happening at the same time, we could witness completely revolutionized styles of music. Fundamentally, the virtual world is able to overcome some limitations that are present in reality. We can see this happen in experiences where certain audio elements sound louder than everything else depending on the user’s head orientation, for example. Music is at the forefront of leveraging virtual reality audio elements and artists are taking advantage of that to create some of the most innovative VR stories. This should come as no surprise — after all, VR is just another tool for storytelling.
The VR audio industry is headed in incredibly exciting directions, and AES Berlin 2017 provided a wonderful platform to explore those together. I can’t wait to see what monumental changes transpire by the time the next convention rolls around.