Spatial Perception of Audio

Spatial Perception of Audio J. D. (jj) Johnston Neural Audio Corporation

What is present in the original room? • Direct sound from every source • Early reflections (which are good for locating walls, etc, naturally, but which do not generally add to audio quality. • Diffuse sound from the performance environment • Note: It makes no difference if these are natural (i.e. recorded live) or synthetic. • With a caveat: You really should have independent reverbs for each channel of audio in order to get real diffuse sensation if you synthesize.

And in the playback venue • Direct sound from each speaker. There is no “indirect” from most any loudspeaker on the market. Dipoles and bipolar speakers are “kinda-sorta” but no more • Yes, you can make a diffuse loudspeaker. But right now, they are rare. • Reverberation from the playback venue • First reflections that do nothing at all but interfere badly with what you want to convey • Reverberation that provides a timbre-cue for the direct signals.

What are the primary cues that the ear can resolve? • Difference in arrival between ears (interaural time difference, ITD) • Difference in timbre between ears (interaural level difference, ILD) • The differences above arise because of Head Related Transfer Functions (HRTF, or HRIR, where “IR” is impulse response, another way to represent the transfer function) • Lack of correlation between left and right ears (diffuse signal) • Comparison between direct and diffuse signals in the playback environment

So, what kind of speaker placement? • (yes, we’re leaving a semester’s course out here) • HRTF’s from the front vary mildly. • HRTF’s from the side have a huge emphasis at higher frequencies • HRTF’s from the back have a huge dip at higher frequencies

Placement, continued: • If you put a speaker at the side, you will hear the HRTF with the large bump at high frequencies for the direct sound. • You automatically compare that with the playback room’s reverberation, and hear something from the side. • If you put a speaker in the rear, you will hear the HRTF with the loss at high frequencies for the direct sound. • But the playback room’s reverberation will still have the same spectrum, so you’ll sense the sound coming from the back.

So? • If you have 5.1 with side speakers, it’s very hard to get proper back sensation • IF you have 5.1 with back speakers, you will not get any side sensation • 7.1 with side and rear speakers allows you to fill this gap. • 7.1 with side and rear speakers also allows you to build a simulation of a diffuse field (from the source material, NOT the playback room) much better than 5.1. N.B. If you want to do this, you need independent reverb functions for each channel!

The summary: • 5.1 Can not provide both side and back localization • 7.1 with side and back speakers can provide both side and back localization • It is generally recognized that 7 channels of main audio, L, C, R, LS, RS, LR, RR is the minimum setup to get 360 planar sensation for a wide listening environment, or when the listener moves their head.

Now, Dr. Tuffy will discuss the issue of THX-Neural surround in more detail.At the end, I’ll explain more or less how Neural-THX processing works.

How does it work? • There are two systems under consideration • One of them is 5-2-5, meaning that 5 channels are encoded into 2, and then decoded to 5. This is not an issue for modern game machines, but would be a solution for titles for older stereo machines. • The other is 7-5-7. That is the system we will discuss today.

How do we do 7-5-7 encoding? • First, the front 3 channels, and the LFE channel, if present, are not affected. Only the side and rear channels are involved.

The steps in encoding: • Encoding is done on a frequency domain basis. The frequency bands are determined by psychoacoustic knowledge of the ear’s frequency analysis method. • All of the following steps are done independently in each frequency band.

First, in each frequency band, the most audible signal among LS/RS/LR/RR is determined, as well as its direction of arrival. The direction of arrival is not limited to 1 channel, but is encoded as explained below. • This most audible signal is then put into both channels, but with amplitude and phase modifications between the channels that provide a 2-dimensional source direction.

The Encoding Plane Both Surrounds Left Surround Right Surround Amplitude Inter-channel Phase Both LS And LR All Channels Both RS And RR Left Rear Right Rear Both Rear

Decoding: • Since the frequency bands are fixed for any given sampling rate: • The two channels are analyzed for both signal content and direction • Each channel is rendered appropriately.

Spatial Perception of Audio