Surround From Stereo

Surround From Stereo David Griesinger Lexicon Dgriesinger@lexicon.com www.world.std.com/~griesngr

Main Message • Two channel audio is ubiquitous and obsolete. • Reproducing two channels through multiple loudspeakers in the listening room (or a car) is a big improvement, Even if it is poorly done. • Fully automatic two-channel-to-surround processors are widely available as consumer products. • We need to make a processor that works well. • Operator-controlled two-channel-to-surround processors are an important component in the re-mix process for creating discrete recordings from multichannel or two channel masters. • The better we can make this component the better the ultimate product. • The design of a processor that works well can teach us how to make better discrete recordings. • If a machine can make a better surround mix than the typical sound engineer, perhaps we have something to learn! • The design of a superior two-channel-to-surround processor requires an understanding of room acoustics and human perception. This knowledge can be applied to recording (and processing) technique.

Introduction • Two channel sound reproduction (stereo) was introduced about 50 years ago. • The improvement in emotional involvement provided by stereo over mono is compelling and easy to demonstrate. • Since then the basic principles of stereo reproduction have not changed. • Improvements in S/N or frequency response are welcome, but not compelling. • It seems likely that CD’s replaced LPs largely because they are more convenient, although the improvement in sound quality was also easily heard. • The improvements in envelopment and sound stage provided by multichannel sound are easy to appreciate and demonstrate – but they are seldom compelling. • Current surround recordings improve the reproduction of the original hall, but are otherwise very similar to a two channel recording. • It takes an innovative and risky mix to show what surround sound can do emotionally, and these mixes are rare. • The major market for surround recordings in the future will probably be for playback in automobiles. • Automobile playback probably requires at least a five to seven conversion as part of the playback system. • This playback venue can reveal problems with common mixing techniques

Home playback of surround will include video. • Standard Video and DVD are also obsolete • Standard NTSC video was also introduced 50 years ago, and has changed very little. • Improvements in color rendition and S/N are welcome, but result in little change in emotional involvement. • Standard video and DVD has insufficient resolution and (usually) too small a screen to fully engage human visual perception. • Competent directors compensate with frequent close-ups, fast editing etc. • The result is distracting and ultimately boring, as the director is always forcing us to watch a particular aspect of the show. • At best such a presentation is good for one viewing only. • Videos of music performances show the heads or fingers of particular performers, forcing us to listen to only one musical line. • The intelligent interplay between musical lines (the essence of Western music) is lost.

What is the Solution? • We need to upgrade both the audio and the video if we want to create a product that is more compelling than current standards. • Translation: We need develop a product that will generate substantial sales. • High-Definition video combined with multichannel audio gives this opportunity. • The audio can be made backwards compatible through an active downmixer, as we will see in this talk. • The video is backwards compatible if the high-def image is cropped and scanned. And this is becoming routine.

High-Definition Demo • Brahms F minor Piano Quintet • Performed by the faculty of the Point-Counter-Point Summer camp. • Video is high-definition (with some artifacts.) • Audio is two channel, single microphone pick-up. • Played here (after post production) with two-channel to five-channel processing.

Five channel to Two channel Encoding • It is widely believed that it is impossible to automatically mix a five channel recoding into a two channel recording. • Standard methods of downmixing have several serious errors in balance. • These errors can be analyzed and corrected with an active downmixer. • The results can be surprising. Very often the downmixed 5 channel recording is better than the manually mixed two channel recording.

Desired properties of an active downmixer • Most importantly, the effective energy of each signal source should be preserved in the two channel output. • There is no passive mixer that can achieve this goal • Next, the two channel mix should reproduce the position of inner voices in the identical position as the 5 channel mix. • This also requires an active mixer, with compensation for errors in the sin/cosine pan law. • We also must be careful to preserve the stereo width of decorrelated signals applied to the rear inputs. • Finally, we can make some adjustments to the mix based on dynamic analysis of the input signals. • Rear signals that are mostly reverberation should be attenuated in the mix by 3dB. • Relatively low level rear signals in the rear that have prominent onsets can be briefly brought up in the mix, to give them a better chance of being heard and properly decoded.

Center channel energy • The balance between the center vocal and the surrounding instruments is critical. • Downmixing is this essential component is difficult, as different engineers can mix it in different ways. • The usual center downmix mixes the center equally into both output channels with an attenuation of 3dB. • This mix works well if the vocals are entirely phantom (no center channel) or hard center (no phantom). • However it is very common to mix the vocals equally in all front channels. • When this configuration is downmixed the vocals will be more than 2dB too strong unless some correction is made. • Likewise an instrument panned half-way between center and left or right will have about a 2dB amplitude error. • How can we correct for this common error?

Rear channel energy • The same problem occurs in the rear channels. • The basic operation for the rear channels is to mix the left rear input (x0.91) to the rear output, while also mixing it (x-0.38) to the right rear output. • The result is to preserve the loudness of a discrete rear input. • But when the two rear inputs are driven in phase, the result is a 2dB to 3dB extra increase in the loudness of the output. • In a passive encoder it is not possible to have the correct balance for discrete rear inputs while also having the correct balance when the rear inputs are driven in phase. • All available passive encoders (Dolby Surround, etc.) attenuate the rear inputs by 3dB to compensate for this error. • This makes discrete rear inputs (left rear only) 3dB too weak. • In addition there is a directional error as the signals pan from left rear only to left and right rear together.

A complaint • This problem has vexed me from the beginning of my work on matrix audio. • Early L7 encoders measured the phase coherence between the front and rear channels, and adjusted mix coefficients on an ad-hoc basis. • The result was adequate for the rear energy problem, but the panning errors remained. • Panning errors were also problematic in the front, particularly when vocals were placed in multiple channels. • It is really very difficult to determine how the mix engineer has placed the vocals – and yet you must do so to correctly downmix the piece. • Mix engineers that put the vocals in all five channels are really asking for trouble. • And if they add random time delays to the center channel the result becomes impossible to downmix at all. • (the difference these delays make to the sound is almost entirely imaginary.)

Solution: • But there is a solution! Elegant, simple, and thoroughly obvious. • We can correct the output signal based on an energy comparison. • We measure the energy of the input signals, and compare this to the energy of the output signals. • It is important to frequency contour the measurement to correspond to human loudness sensitivity. • The difference signal can be used in a simple feedback loop to correct the amplitudes of the mix. • As an added advantage it is possible with this technique to correct for panning errors, and for the difference between the sine/cosine pan law and subjective pan positions. • (patent applied for)

Encoder Block Diagram Here you can see the basic structure of the active encoder. Note the center is mixed to the output with two variable coefficients, ml, mr, (usually mr=ml= -3dB). The rear channels are mixed with the basic .91 and -.38 ratio to the outputs, but they are also attenuated by the coefficients mi and ms, to preserve the correct output power under all conditions. (usually mi = 1, ms = 0.) The use of 90 degree phase shift networks is common in all encoders, as it allows arbitrary pans from front to rear.

Surround from Two channel • Two choices: • We can create a whole new mix – repositioning the musical forces, adding reverberation, etc. • This type of upmix requires the thoughtful participation of a skilled operator. • Ideally these operations are done by the musical copyright holder, for distribution on CD or DVD. • Or we can preserve the original recording as much as possible, while using multichannel reproduction to: • Enlarge the listening area • Increase envelopment • Transform the listening room to a larger, more comfortable space. • This type of upmix should be fully automatic, and can be part of a consumer playback system. • Automatic processing is particularly beneficial in small spaces, such as cars. • This talk will concentrate on automatic processing. • We want to reproduce the original recording, while improving the listening area and the overall impression. • In this talk we will deliberately limit ourselves by not discussing adding anything to the original, such as reverberation or early reflections. Potentially at least, the original recording can be reconstructed from the converted outputs.

The default • The MOST AUDIBLE difference between the various currently available 2-5 processors is what they do when there is NO detectible direction in the input sound. • When there is no detectible sound cue we reproduce the sound in a default condition • The default condition is the most common condition in most music, and how we treat this case is of utmost importance. • If the default behavior of the processor can improve the acoustic performance of the original without altering the original sound stage, we will have succeeded. • Our major goal is to “do no harm” to the original mix, while improving listener envelopment and increasing the useable listening area.

What does “default” mean? • In the default condition the input channels are decorrelated – in other words there is no common signal between the two channels. • An example might be an orchestral recording with violins on one side and cellos on the other. • Or any complex mix with a lot of reverberation or high left/right separation. • In a recording mastered for two loudspeaker reproduction the default playback condition is normal stereo – a phantom image spread between the front two speakers. • Two channel stereo reproduction allows the listener to identify the direction of the sound sources through the (buried) amplitude cues in the original recording. • If we are unsure about what to do in converting this recording to multichannel, our best option is to preserve the stereo image. • This seems obvious, but it is not standard practice

Standard 2-to-4 decoders • The usual default condition of multichannel decoders is Dolby Surround: • The center and rear channel are as loud as the main the main channels, causing the sound image to move strongly toward the center of the room. • Image width is reduced by about ½. • The envelopment in the room is also reduced since the rear energy is added entirely in the medial plane (so there is no lateral sound energy.)

Standard 5 channel decoders • The most obvious extension of Dolby Surround to 5 channels uses a type of matrix to supply the left and right rear speakers. • If the input signals contain a buried sound that is encoded to the rear, this matrix will send the signal predominantly to the appropriate speaker. • However when there is NO buried encoded signal (which is almost always the case) this matrix results in a center channel that is too strong, and rear channels that are out of phase!.

Antiphase rear channels • The tendency of the rear channels to be out of phase an inherent problem with phase/amplitude encoding and decoding. • As we pan a signal from left to left surround the input signal is positive phase in the left input, and negative phase in the right input. • If we pan from right to right surround the signal is positive in the right channel, and negative in the left. • So what happens when we want to be fully to the rear? – should the input signals be negative on the left or the right? • If we choose to always make the right channel with negative phase, then in the default condition the loudspeakers will be out of phase. • The best solution to this problem is to incorporate a variable phase shift network in the rear channels, that will actively flip the phase when there is a sound that is strongly in the rear.

Examples – a “music” decoder X-Y plot of the rear outputs of a popular “music” decoder when the inputs are driven by uncorrelated pink noise. The correlation coefficient is -0.25

Example – a “film” decoder The problem can be corrected to some degree by reversing the phase of the right rear channel. The rear channels are now decorrelated – but pans from right front to right rear might be a little peculiar. The correlation coefficient is +0.25.

Example – a decorrelated default It is possible to design a decoder where the default results in decorrelated rear channels. The result is sonically superior. The correlation coefficient is 0.

Block diagram for a decorrelated default If we create the rear channels by delaying and frequency contouring the front channels, the full separation in the input channels is maintained in the rear. This results in higher envelopment around the listeners and a more comfortable sound. The frequency contouring is vital – and contains shelving as well as rolloff. The effect is highly audible, particularly in a small listening room or an automobile. However a decorrelated default makes the decoder design more complex, as there is no inherent cancellation of a center speaker in the rear channels

Decorrelated rear steering • The next most audible difference between current decoders is their behavior when playing surround recordings where the original rear channels were decorrelated. For example, • Crowd noise in music CDs • Orchestra sounds in the rear on a film • Backup chorus in music CDs • When these recordings are encoded to two channel these decorrelated signals result in a output with net negative correlation. • It is desirable to decode these signals such that the original decorrelation is restored. • Most available decoders do poorly on this test, and the result is highly audible – (particularly in small rooms)

Examples: Decorrelated rear steering “Film” decoder (Corr ~0.5) “Music” decoder Decorrelated Decoder “Music” and “Film”

Listening Examples • Decorrelated rear steering during crowd noise in “Hotel California” • Note high positive or negative correlation leads to a flat, two-dimensional sound • Example with pink noise • (must play from CD)

How does a decoder work with directional signals? • Once we have a great default, we need to consider sounds which have a distinct direction: • We need to detect the direction of these sounds • And then we need to adjust mix coefficients to direct these sounds to the appropriate speakers. • Frequency contouring of the rear channels is essential, and must be made to depend on the degree of steering. • A high frequency rolloff is important when reproducing sounds in the front, but it should vanish smoothly when signals move rear. • In addition a shelving filter at about 300Hz allows the rear speakers to reproduce bass frequencies while not drawing attention to themselves. • This filter should also disappear when signals move rear. • The type of frequency contouring used in upmixing is also very useful in making discrete recordings!

Block diagram of a 5 channel decoder This diagram does not include shelving and rolloff filters, which are essential in a practical design.

The matrix elements in a decoder designed for maximum decorrelation • The output of the decoder is entirely determined by the way the ten matrix elements depend on l/r and c/s • We can graph the surface formed by the matrix element on the l/r c/s plane • By symmetry we need to graph only 5 of the 10 matrix elements © Lexicon - a Harman International Company

Matrix elements in a decorrelated decoder • Left Input • Right Input • Center Output • Left Front Output • Left Rear Output

Left input to Left Front output element (LFL)inverted back to front to reveal the trough at left rear • The peak in front keeps loudness correct for center signals

The Right input to Left front output matrix element (LFR) • The peak in front keeps loudness correct for mild front steering

The Left input to Left Rear output matrix element (LRL) • The ridge in the rear keeps separation high as sound moves back

The Right input to Left Rear output matrix element (LRR) • Note the ridge along the back - which keeps separation high in the rear

The Left input to Center output matrix element (CL) • Note the rapid increase in level as the steering moves forward. This preserves stereo separation while making a hard center

Decorrelated Decoder Conventional Decoder • This element shows particular attention to the center of the plane. • These surfaces are defined by the edges, not the center.

For directional signals we must: • 1. Analyze the original recording to determine the directions of the original instruments. • 2. Adjust the mixing parameters of our processor to reproduce these sounds though a different loudspeaker arrangement in precisely the same positions

Original recording analysis • We must look for cues in the original recording that will allow us to determine the directions of all the instruments • We have to determine these directions quickly enough and accurately enough to mimic the properties of human localization. • If we can closely approximate human hearing, the perceived results will be flawless. • Human sound perception is based on detecting “Sound Events”. • So our processor needs a “Sound Event Detector” with accurate estimations of probable sound direction. • Human perception uses several directional cues: • Direct localization of sound events through amplitude and time differences between the two ears. • These localization cues are mostly supplied through amplitude panning in a recording. • Overall “center of gravity” localization, done through level differences (for left-right) combined with head rotation (for front-back) • This is how we can be aware that a group of instruments are behind us, even if there are no discrete sound events. • If we determine that the original was surround encoded (Logic 7 or Dolby Surround) we need to correctly determine rear directions.

Amplitude Panning • Nearly all current recordings employ amplitude panning • Phantom images are moved from one speaker location to another by varying the relative amplitude between the speakers. • As a consequence the intended position of a sound source can be detected if we compare the amplitudes of the two input channels. • Both applications require that we know the perceptual result of various amplitudes. • We need to know the “pan-law”

The sine/cosine pan-law • The most common pan-law in common use is the sine/cosine pan-law • Left_output = cos(p)*input • Right_output = sin(p)*input • Where p is assumed to vary from 0 degrees (full left) to 90 degrees (full right), with center at p = 45 degrees. Note that then: • (Left_output)^2 + (right_output)^2 = (input)^2

Sine-Cosine drawing If the sine-cosine pan law is accurate, and we set p=22.5 degrees, we should hear a sound image half-way between center and left.

How does a surround decoder work? • Compatibility with Dolby Surround requires a phase/amplitude decoder • thus direction is determined by evaluating |Lin|/|Rin| (lr) and |Lin+Rin|/|Lin-Rin| (cs). • Stereo compatibility requires that the front localization roughly follows a sine/cosine pan law. • Thus for strongly steered signals • Lin = cos(A) • Rin = sin(A) • As angle A varies from 0 to 90 degrees the sound pans from left front to right front • Strongly steered rear signals can be encoded by allowing angle A to increase from 90 degrees to 180 degrees.

The left/right and center/surround signals • Define l/r = arctan(|Lin|/|Rin|) - 45 degrees • Define c/s = arctan(|Lin+Rin|/|Lin-Rin|) - 45 degrees • When there is no dominant direction in the input signal, all the levels l/r and c/s ~=0 • A signal cannot be both left and center at the same time! Thus the sum of |l/r| and |c/s| is bounded. • |l/r| + |c/s| <= 45 degrees © Lexicon - a Harman International Company

The l/r and c/s signals are bounded • l/r and c/s are not independent for strongly steered signals, where Lin = cos(A), Rin = sin(A) • in this case c/s ~= A, l/r ~= 45 -A • Allowed values for l/r and c/s fall within a diamond in the l/r c/s plane. • A circularly panned signal will produce l/r and c/s values which lie on the boundary of the diamond.

A circularly panned noise signal as seen by a phase/amplitude decoder • Music signals will fall inside this boundary

Histogram of A a typical stereo piece (first 30 seconds of Jennifer Warnes) • Note the bulk of the power is not steered (uncorrelated), but it moves from the middle to the front

Sound event direction detection, newer circuit • Jennifer Warnes – Bird on a wire, whole song

A Histogram of an encoded 5-channel piece (first 30 seconds of Boyz II Men) • Note the extensive use of the rear directions. Here we need full stereo width in the front and in the rear!

A Histogram of a classical 5-channel piece (30 seconds of 1812 recorded by Eargle) • Note the music has no net left/right bias. It is stereo, but the voices are mostly in the rear.

Event Detection • A MAJOR problem in the design of a two-channel to multichannel processor is determining the true direction of the incoming sounds • When the recording includes only a single sound there is no problem. • But most music is a complex mix of many sounds, with reverberation. Reverberation provides a random directional signal that can easily confuse the processor. • An event detector uses information in the amplitude envelope to search for loudness patterns that represent notes in music or syllables in speech • Depending on the rise-time and fall-time of the detected sound events the time constants of the sound processor are adjusted to maximize the speed with which sounds can be re-directed to the correct positions • While NOT rapidly steering on more continuous music.

Surround From Stereo

Surround From Stereo

Presentation Transcript

fireplace surround

Stereo

Shape from Shading Photometric Stereo

Stereo

Stereo

Surround me

Stereo

Stereo

Stereo

Surround Story

MPEG Surround

STEREO

Stereo and Surround Mixing Techniques

Visualization- Determining Depth From Stereo

Stereo

STEREO

Dense DSMs from Stereo Imagery

Stereo

Shape from Stereo

Simplicity Surround

Stereo

Stereo