The key to creating realistic audio for this is to synchronise sounds according to the user’s head orientation and view in real time. This helps replicate an actual human hearing mechanism, which makes the listening experience more realistic. Producing truly immersive sound requires several steps. First, you must capture the audio signals, then mix the signals and finally render the sound for the listener.
Ambisonics is a technique that employs a spherical microphone to capture a sound field in all directions, including above and below the listener. This requires placing a soundfield microphone (also known as an Ambisonics or 360 microphone) somewhere near the position where you intend to listen to. Keep in mind that these microphones will record a full sphere of sound at the position of the microphone, so be strategic with where you place them. It’s also important that your mic is not spotted in the scene, so we encourage placing the microphone directly below the 360 camera.
In addition to capturing audio from a soundfield microphone, content creators also need to acquire sounds from each individual object as a mono source. This enables you to attach higher fidelity sounds to objects as they move through the scene for added control and flexibility. With this object-based audio technique, you can control the sound attributed to each object in the scene and adjust those sounds depending on the user’s view.
Combining object, Ambisonics and channels (like traditional 2.0 if needed) and balancing them plays an important role in mixing and mastering 3D audio. If you captured the object and the Ambisonics together, be sure that the Ambisonics signal already contains the objects. You may need an additional process to remove or balance those object tracks to ensure they aren’t counted twice.
With VR and 360 video content, you not only need to consider the actor’s mouth movements but also carefully place the sound according to the position of the actor on the 360 screen, which requires a new and more dedicated sound mastering tool. Specifically, it’s now important to use a tool that lets you edit as you watch, so that while watching the visuals, you can match the sounds accordingly in both space and time.
Historically, content creators relied on DAWs for everything from mixing to mastering into a target layout. So the output of a DAW was a pre-rendered sound bed. However, with VR, sound rendering must take place on the listeners’ end, which, in this case is the actual VR hardware and is most frequently a head-mounted display (HMD). All of the possible scenarios have to be processed through HMD devices, which can require a huge amount of additional processing power. As such, while it still maintains higher quality, minimising latency as well as the amount of computation power needed when rendering is key.
A benefit of the renderer being on the listener’s end is the possibility for unprecedented levels of personalisation. Keep in mind that with a conventional pre-rendered bed, you can’t variate its rendering according to each user. However, personalisation is still a long way out as measuring an individual’s personalized HRTF is still an expensive and time-consuming process.