audio
Download
Skip this Video
Download Presentation
Audio

Loading in 2 Seconds...

play fullscreen
1 / 19

Audio - PowerPoint PPT Presentation


  • 187 Views
  • Uploaded on

Audio. Hao Jiang Computer Science Department Boston College Oct. 11, 2007. Digital Audio. Audio comes from different sources: Speech. Sounds of instruments, Music. Sounds of all other kinds (the sound of wind, train and ocean). Audio needs new methods for coding and processing.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Audio ' - ishi


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
audio

Audio

Hao Jiang

Computer Science Department

Boston College

Oct. 11, 2007

CS335 Principles of Multimedia Systems

digital audio
Digital Audio
  • Audio comes from different sources:
    • Speech.
    • Sounds of instruments, Music.
    • Sounds of all other kinds (the sound of wind, train and ocean).
  • Audio needs new methods for coding and processing.
  • Audio processing is a key task in multimedia systems
    • Audio coding (MPEG audio, mp3, AAC and others)
    • Authoring and representation (composition)
    • Analysis and searching (retrieval and database)
    • 3D sound, etc.
  • We will focus on basic audio processing, MPEG audio and related topics.

CS335 Principles of Multimedia Systems

audio processing
Audio Processing
  • Audio authoring

Audio file formats:

Waveform files and

MIDI.

MIDI: Musical Instrument

Digital Interface. Instead of

storing the waveform

samples, MIDI file

has a sequence of commands

to control an audio

device to generate

a specified note

with given properties.

CS335 Principles of Multimedia Systems

audio processing using matlab
Audio Processing Using Matlab
  • To load a wave in Windows:

audat = wavread(‘filename.wav’);

Or, directly open the file and load a stream of “words” (2 bytes) or bytes depending on the wav format.

  • To play a sound, use sound(audat, samplingrate).
  • To display the spectrogram, use specgram.
  • Audio analysis are done in frames of 20ms – 40ms long.

CS335 Principles of Multimedia Systems

frequency domain analysis
Frequency Domain Analysis
  • Fourier transform can be used to decompose any signal into summation of sinusoidal waves.
  • In Matlab, we can use fft (Fast Fourier Transform) for frequency domain analysis.

T

Base frequency ¼ 1/T

The time domain waveform

The frequency

Domain components.

CS335 Principles of Multimedia Systems

mp3 and others
MP3 and Others
  • MPEG (Motion Picture Expert Group) and ISO (International Standard Organization) have published several standards about digital audio coding.
    • MPEG-1 Layer 1,2 and 3 (MP3)
    • MPEG2 AAC
    • MPEG4 AAC and TwinVQ
  • Other standards
    • Dolby AC3
  • They have been widely used in consumer electronics, digital audio broadcasting, DVD and movies etc.

CS335 Principles of Multimedia Systems

perceptual coding in mpeg
Perceptual Coding in MPEG

audio

Encoder

MUX

Bit stream

Dynamic

bit allocation

FFT

Masking

Threshold

Encoder

MUX

audio

Bit stream

Dynamic

bit allocation

CS335 Principles of Multimedia Systems

simultaneous masking
Simultaneous Masking
  • A strong audio component can mask its nearby frequency components.

dB

Masker

Sound

pressure

level

Masking

threshold

Threshold

in quiet

1000

20000 Hz

20

CS335 Principles of Multimedia Systems

masking and quantization
Masking and Quantization

Masker

dB

Signal

To mask

ratio

Sound

pressure

level

m+1-bit

quantizer SNR

Minimum

masking threshold

for band A.

m-bit quantizer SNR

20000 Hz

20

Critical band A Neighbor

critical band

A critical band defines the “resolution” of the hearing at some frequency location.

CS335 Principles of Multimedia Systems

temporal masking
Temporal Masking

Amplitude

Pre-masking

curve

Post-masking

curve

time

CS335 Principles of Multimedia Systems

mpeg perceptual model
MPEG Perceptual Model
  • A matlab demo.

CS335 Principles of Multimedia Systems

mpeg audio layer 1
MPEG Audio Layer 1
  • MPEG (1 and 2) audio allows sampling rate at 44.1 48, 32, 22.05, 24 and 16KHz.
  • MPEG filters the input audio into 32 bands.

12 samples

Filtering

And

downsampling

Perceptual

coder

12 samples

Audio

Normalize

By scale

factor

384 samples

12 samples

CS335 Principles of Multimedia Systems

mpeg audio layer 2
MPEG Audio Layer 2
  • Layer 2 is very similar to Layer 1, but groups 3 12-samples together in coding.
  • It also improves the scaling factor quantization and also groups 3 audio samples together in bit assignment.

36 samples

Filtering

And

downsampling

Perceptual

coder

36 samples

Audio

Normalize

By scale

factor

3x384 samples

36 samples

CS335 Principles of Multimedia Systems

overlapped transform and mdct
Overlapped Transform and MDCT

Window 1

Window 3

2N

Window 2

Window 4

In overlapped transform, 2N samples are transformed to N elements.

1

3

In reverse

Transform:

2

4

+

Reconstructed result.

CS335 Principles of Multimedia Systems

some matlab codes
Some Matlab Codes
  • The program compares DCT and MDCT in audio processing.
  • Code is available on the course website as a tar ball mdct_and_dct.tar.

CS335 Principles of Multimedia Systems

slide16
MP3
  • MP3 is another layer built on top of MPEG audio layer 2.
  • MP3 further does MDCT on each band and tries to encode the MDCT coefficients.
  • MP3 then uses Huffman coding to further compress the bit streams losslessly.

CS335 Principles of Multimedia Systems

file format
File Format

Mpeg audio puts header in each of the frame, so that they can be decoded separately.

Header

CRC

Bit

Allocation

Scale

factors

Subband Data

Header

CRC

Bit

Allocation

Scale

factors

Subband Data

Frame 1

Frame 2

CS335 Principles of Multimedia Systems

other audio coding standards
Other Audio Coding Standards
  • MPEG 2 and MPEG 4 ACC (advanced audio coding)
    • Not backward compatible
    • Use MDCT without bandpass filtering
  • Dolby AC3
    • MDCT based codec
    • Similar to MPEG ACC but uses a different quantization and coding scheme
    • A de-facto standard for DVD and Digital audio in Movie.

CS335 Principles of Multimedia Systems

realtime audio systems
Realtime Audio Systems

Audio I/O

Process

Write pointer

Read pointer

Audio input

circular queue

Audio

Processing

Unit

Audio output

circular queue

CS335 Principles of Multimedia Systems

ad