There are three different audio formats that are currently utilized for VR Audio: channel, Ambisonics and object. In our first post, we’ll dive into channel-based audio since it is the traditional audio format.
Increasing the channel configurations
Channel-based surround sound formats have attempted to deliver truly immersive environments with ever-increasing numbers of speakers. If you increase the number of channels, the resolution of a space increases, leading to audio that more closely reflects real life. In perhaps the most ambitious example of this, NHK developed the 22.2 channel surround sound setup displayed below, attempting to develop the ideal immersive audio configuration for a channel-based format.
While increasing the channel number can increase space resolution, channel format itself has plenty of downsides. Even in the impressively large format above, there is still a discrepancy between the actual location of the sound source and certain speaker locations. While this has been acceptable for traditional methods of consumption, it breaks the illusion of full immersion in VR if a sound source doesn’t actually match its visual location.
Vector Base Amplitude Panning
If the source is in between the speakers, a combined sound created by the speakers creates virtual sound location. Vector Base Amplitude Panning (VBAP) is one of the most common of these methods for positioning virtual sources to multiple loudspeakers. However, whenever you are manipulating the sound like this, you run the risk of producing an artifact, or audio material that is accidental or unwanted.
Some methods have been explored to make up for the various drawbacks of VBAP, but creating virtual sound while there is speaker and source discrepancy is going to be imperfect by nature.
For VR audio, even more specific issues arise from VBAP. If channel-based signals are played through headphones, it means that sound sources in 3D space are first converted to limited speaker locations through VBAP. These channels are then converted again through a binaural filter. When there are these many conversions, loss is inevitable and creating truly immersive experiences becomes increasingly difficult.