MPEG-4 AUDIO OVERVIEW Roberta Eklund Consultant
MPEG-4 Audio Overview • Natural Audio • T/F • CELP • PARA • Structured Audio • SAOL • SASL • SASBF • MIDI-DLS-version 2 • TTS • Cross Tool(Algorithm) Functionality • Pitch/tempo change • Bitrate scalability • Computation complexity scalability • Error robustness • Audio related effects • Acoustic virtualization
MPEG-4 Audio Tools PROFILES • Object Profile - defines the syntax of the bitstream for one single Object, that can represent a meaningful entity in the Audio or Visual scene. Elementary bitstream • Composition Profile - defines which different Object Profiles can be combined in the Audio or Visual scene. Combinations of Elementary bitstreams.
Block Diagram of CELP Decoder • Excitation signal generator: • codebook • regular pulse excitation (RPE) • multi-pulse excitation (MPE)
PARA is Two Codecs in One Two operating modes • harmonic and noise components (HVXC) • for speech coding at 2...4 kbps • harm. & indiv. sinusoidal comp. + noise (HILN) • for coding of music signals with low complexity content (e.g. single instruments) at 4...16 kbps • combination of both modes • support by syntax, defined transition • automatic mode selector • cross fade from one signal to another one
Text-to-Speech • Phonemic (language-independent) syntax • Prosody, timing cues • Language, dialect, gender, age parameters • Automatic synchronization with FBA • Exact TTS synthesis non-normative; only interface is specified
Structured Audio • Structured Audio - Sound coding using structured descriptions • Structured Audio decoder - music and sound-effect synthesis • MMA, Microsoft, EMU now collaborating on MIDI DLS-version 2 in MPEG4
SAOL • Downloadable BNF synthesis grammar • Header contains description of several synthesizers and effects processors control algorithms and routing instructions for audio flow of control • SAOL has 100 primitive processing instructions, signal generators and operators which fill wavetables with data.
SASL and MIDI • New format for describing control parameters - Basically a scheduler of audio events - Designed to interface well with SAOL - New Control Language Similar to MIDI • MIDI (Musical Instrument Digital Interface) • Simpler format for describing control • Included as alternate control method • Leverages existing authoring tools • Gives “backwards compatibility” to SA
DLS Level 2 • Aims at consistent synthetic audio playback across wide range of platforms • Defines a simple wavetable synthesizer • Bitstream includes sound samples • Score expressed in MIDI • Growing support from both software and hardware developers • DLS Part of DirectMusic in Microsoft’s DirectX 6.0
DLS-2 synthesizer model • Simple yet powerful structure much alike to many existing synthesizers in the market (eg in PC soundcards) • Uses loopable samples as sound sources (wavetable) • variable routing of control sources • 2 envelopes for amplitude control • 2 low frequency oscillators • 1-pole dynamic low-pass filter • Standardized response to MIDI controllers
Audio Bifs Synchronization with Visual! BIFSstuff AudioMix AudioMix HRTF AudioFX AudioFX AudioFX AudioDelay AudioSource AudioSource AudioSource Finger snaps (Parametric) Bass (SA) Piano (SA) Audiochannels
Conclusion • MPEG-4 Audio attempts to offer solutions to all spectra of sound. • Some of the tools are more stable, while others are still in Research and Development. • MPEG2-AAC is the best multi-channel lossy audio compression standard to date.
Acknowledgements I would like to thank the authors from the references for providing the material presented here today.
Definitions • T/F Time/Frequency (MDCT transform) • AAC Advanced Audio Coding • PARA Parametric • CELP Code Excited Linear Prediction • SA Structured Audio • PNS Perceptual Noise Substitution • HVXC Harmonic Vector eXcitation Coding • HILN Harmonic and Individual Line + Noise • SAOL Structured Audio Orchestra Language • SASL Structured Audio Score Language • MIDI Musical Instrument Digital Interface • TTS Text to Speech
More Definitions • CD Committee Draft • IS13818-7 Advanced Audio Coding • LC Low Complexity • BSAC Bit Sliced Arithmetic Coding • SSR Scalable Sample Rate • PNS Perceptual Noise Substitution • VBR Variable Bit Rate • TLSS Tools for Large Step Scalability • SNHC Synthetic/Natural Hybrid Coding • DLS Downloadable Samples
AAC Decoder ComplexityEvaluation MPEG AAC Decoder Complexity 2-channel Main Profile 40% of 133 MHz Pentium 2-channel Low Complexity 25% of 133 MHz Pentium 5-channel Main Profile 90 sq. mm die, 0.5 micron CMOS 5-channel Low Complexity 60 sq.mm die, 0.5 micron CMOS
AAC Test Results • Test at BBC and NHK according to ITU-R BS.1116 • triple-stimulus/hidden-reference/double-blind • ITU-R 5-point impairment scale • 95% Confidence Intervals • MPEG AAC provides “indistinguishable” quality at 320 kb/s per five channels • MPEG AAC at 320 kb/s outperforms MPEG BC Layer II at 640 kb/s per five channels • Recent Stereo Tests at NHK Showed MPEG AAC provides “indistinguishable” quality at 128 kb/s per two channels
References • M. Bosi, E. Schrierer, B. Edler, Peter G. Schreiner MPEG-4 Seminar, Fribourg, Switzerland 1997 • S. Quackenbush, “Coding of Natural Audio in MPEG-4”, Proc IEEE ICASSP, Seattle, 1998 • B. Grill, B. Edler, I. Kaneko, Y. Lee, M. Nishiguichi, E. Scheirer, and M. Väänänen (Eds). ISO 14496-4(MPEG-4 Audio) Committee Draft. MPEG document N1903 • E. Schrier, “The MPEG-4 Structured Audio Standard”, Proc IEEE ICASSP, Seattle, 1998 • Juergen Herre, “Updated Description for Perceptual Noise Substitution Tool”, MPEG Document M2692 • E. Scheirer, R. Väänänen, J. Huopaniemi, “AudioBIFS: The MPEG-4 Standard for Effects Processing”, AES, SF, 1998 • Overview: http://www.cselt.it/mpeg/standards/mpeg-4/mpeg-4.htm