In our previous posts regarding the audio formats used for VR, we went over channel and Ambisonics approaches. Today, we’ll be going over the final format — object. Object-based audio is the method of expressing sound sources as individual signals. An object signal is captured directly from the sound source, as opposed to Ambisonics which captures the entire sound as a sound sink at the listener’s location.
Object-based audio is not a brand new idea that is emerging strictly because of the rise of VR. Even before the VR market exploded, BBC showcased their own object-based broadcasting. At the same time, Dolby showcased ATMOS, which also utilizes object-based audio. Even before that, the MPEG-4 international standard already included an object-based multimedia element in 1998. Unfortunately, at that time, it was still too premature of a technology to unlock the full potential of object-based experiences.
10 years later, in 2008, MPEG revisited object-based audio with an object-based compression encoding technology called MPEG-D SAOC (Spatial Audio Object Coding). Then in 2015, the MPEG-H 3D Audio standard incorporated object-based audio, channel, and Ambisonics — agreeing that all three formats are important for future audio formats. For MPEG-H 3D Audio, the signals are mixed, delivered, and rendered in a process that can be applied to the next generation of UHD broadcasting and virtual reality.
What’s wrong with the current immersive sound solutions for virtual reality? Well for starters, no home, theater, or even studio actually positions the loudspeakers in the ideal 5.1 channel configuration that ITU-R prescribes. (Your mom would never allow that much space to be used in the living room). So when the speaker layout is less than perfect, you actually need flexible delivery, something that object-based audio provides. In fact, that flexibility is one of the key reasons why object-based audio was included in the MPEG-H 3D Audio international standardization.
The freedom that object-based audio ensure for creators is present throughout the entire workflow. You cannot extract the original stem or the object once it’s mastered in the output signal. If, however, you export the objects to an object-based audio format, then they are all delivered to and through the renderer as they were originally present in the mix. This ensures that everything the creators do in the DAW gets presented as intended to the end listener.
In post-production, each stem in the individual track on Digital Audio Workstations (DAW) is actually an object before it gets mastered to the channel bed. The mastered output can be stereo or 5.1 or Ambisonics, which is done through bouncing or exporting. Reversing this process, trying to go from Ambisonics and/or stereo to object, is not possible. However, object-based audio does require post-production to account for diverse factors such as room reverberation or wall reflection.
What’s even more fascinating about object signals is that they can easily be converted into channel or Ambisonics. In fact, audio engineers have already been pseudo-transforming object to channel by using a panner on their DAWs. Depending on the target speaker layout, each individual sound object in 3D space can be allocated to a certain speaker outlet by modifying the gain for each channel. If this sounds familiar, that’s because we briefly discussed this concept in a previous post when we were exploring VBAP.
In order to transform object to Ambisonics, you need the virtual coordinates of where you want to put the sound object in three-dimensional space. Then you need it to be delivered in spherical harmonics format or B-format. G’Audio calls this process “object to B-format conversion” or simply “O2B.” O2B can also involve converting into Higher -Order Ambisonics as long as the appropriate channels are transmitted.
When you step into the realm of gaming, almost all sound mixes are object-based out of necessity. Every sound source must be independent so its characteristics can change depending on the gameplay and user interactions. Nowhere, though, is an object-based format more important than in VR audio. A user’s head orientation and location (made variable by 6DOF) have to be taken into consideration, and this is only possible with object-based audio that is delivered without compromise.