Ambisonics has been around for quite some time, but due to restrictions in technology, it’s only becoming better understood and more discussed today. Ambisonics is a way of recording and reproducing surround sound in both horizontal and vertical surround from a single point source. Essentially a snapshot of 360 sound at one point, Ambisonics is also sometimes referred to as scene-based audio.
One key element in understanding Ambisonics is beam patterns. A beam pattern is the part of a particular signal that makes up the audio sphere — the mixture of multiple beam patterns makes up an entire sphere. The representation of the entire sphere from these beam patterns is called Ambisonic B-format. Meanwhile, Ambisonic A-format is the state before a signal is converted into B-format. A-format signals are created when spatial sound is captured through a microphone at a single location. B-Format material is necessary for VR applications because it can be rotated into place before decoding, depending on where the user’s head is pointing.
In order to increase space resolution, you need to increase the order of Ambisonics signals. First-Order Ambisonics attempts to use four audio channels to replicate any number of loudspeaker signals that are needed. As you increase the order above First-Order Ambisonics (FOA), you are introduced to Higher-Order Ambisonics (HOA). Each HOA signal inherently includes a lower order Ambisonics signal. The image below explains how many signals are needed per order to deliver to B-format, or how many signals need to be delivered to the receiver. For example, in First-Order Ambisonics, 4 signals need to be sent to the receiver. For 2nd order, 9 need to be sent, and for 3rd order, 16, and so on. Since the word “channel” has traditionally been used as the unit for sending audio signals from a sender to receiver, it is used interchangeably in this context as well. This means that for 2nd Order Ambisonics, 9 channels need to be sent.
Once you have a B-Format audio stem, it’s possible to manipulate it in various ways. As noted previously, since the signals are represented as a sphere, they can easily be rotated to reflect yaw, pitch, and roll. One way to rotate the sphere is by using a rotation matrix, which moves all material in the sound field in a specific direction. This flexibility is why audio signals that are captured by an Ambisonics microphone are often converted to B-format first and then delivered or edited. Also, Ambisonics are already captured at the sink position, so they don’t need much post-production or modification to reflect how it will sound from that specific position in that specific environment.
All of this makes Ambisonics good for 360 video — it reflects the spot in the sound scene where the user is and it can easily be rotated. Capturing 360 sound live also lends itself well to the use of an Ambisonics microphone. However, VR experiences are rapidly shifting to 6DOF experiences, and it is there that Ambisonics can struggle the most. Ambisonics has fundamental limits when reacting to user movements like walking around, or when the axis is dynamically moving. Ultimately, Ambisonics is useful for 3DOF or 360 video formats but doesn’t support the most engaging elements of VR experiences.