1 / 21

Basic Features of Audio Signals ( 音訊的基本特徵 )

Basic Features of Audio Signals ( 音訊的基本特徵 ). Jyh-Shing Roger Jang ( 張智星 ) http://mirlab.org/jang MIR Lab, CSIE Dept National Taiwan Univ., Taiwan. Audio Features. Four commonly used audio features Volume, pitch, zero crossing rate, timbre Our goal

Download Presentation

Basic Features of Audio Signals ( 音訊的基本特徵 )

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Basic Features of Audio Signals(音訊的基本特徵) Jyh-Shing Roger Jang (張智星) http://mirlab.org/jang MIR Lab, CSIE Dept National Taiwan Univ., Taiwan

  2. Audio Features • Four commonly used audio features • Volume, pitch, zero crossing rate, timbre • Our goal • These features can be perceived (more or less) subjectively. • Our goal is to compute them quantitatively (and objectively) for further processing and recognition.

  3. General Steps for Audio Analysis • Frame blocking • Frame duration of 20~40 ms or so • Frame-based feature extraction • Volume, zero-crossing rate, pitch, MFCC, etc • Frame-based Analysis • Pitch vector for QBSH comparison • MFCC for HMM evaluation • …

  4. Frame Blocking Overlap Zoom in Sample rate = 16 kHz Frame size = 512 samples Frame duration = 512/16000 = 0.032 s = 32 ms Overlap = 192 samples Hop size = frame size – overlap = 512-192 = 320 samples Frame rate = 16000/320 = 50 frames/sec Frame

  5. Audio Features in Time Domain • Time-domain audio features presented in a frame (analysis window) Fundamental period Intensity Timbre: Waveform within an FP

  6. Audio Features in Frequency Domain • Frequency-domain audio features in a frame • Energy: Sum of power spectrum • Pitch: Distance between harmonics • Timbre: Smoothed spectrum Second formant F2 Pitch freq First formant F1 Energy

  7. Frame-based Manipulation • For simplicity, we usually pack frames into a frame matrix for easy manipulation in MATLAB: • [y, fs, nbits] = wavread(‘file.wav’); • frameMat = enframe(y, frameSize, overlap); … frameMat = Frame n Frame 1 Frame 2

  8. Volume (I) • Loudness of audio signals • Visual cue: Amplitude of vibration • Also known as energy or intensity • Two major ways of computing volume: • Volume: • Log energy (in decibel):

  9. Volume (II) • Perceived volume is influenced by • Frequency (example shown later) • Timbre (example shown later) • Computed volume is influenced by • Microphone types • Microphone setups

  10. Volume (III) • To avoid DC bias (or DC drifting) • DC bias: The vibration is not around zero • Computation: • Volume: • Log energy (in decibel): • Theoretical background (How to prove them?)

  11. Volume (IV) • Functions for computing volume • Example: volume01 • Example: volume02 • Example: volume03 • Volume depends on… • Frequency • Equal loudness test • Timbre • Example: volume04

  12. Zero Crossing Rate • Zero crossing rate (ZCR) • The number of zero crossing in a frame. • Characteristics: • Zero-justification is required. • Noise and unvoiced sound have high ZCR. • ZCR is commonly used in endpoint detection, especially in detection the start and end of unvoiced sounds. • To distinguish noise/silence from unvoiced sound, usually we add a shift before computing ZCR.

  13. ZCR Computations • Two types of ZCR definitions • If a sample with zero value is considered a case of ZCR, then the value of ZCR is higher. Otherwise its lower. • The distinction diminishes when using a higher bit resolution. • Other consideration • ZCR with shift can be used to distinguish between unvoiced sounds and silence. (How to determine the shift amount?)

  14. ZCR • ZCR computing • Example: zcr01 • Example: zcr02 • To use ZCR to distinguish between unvoiced sounds and environmental noise • Example: Example: zcrWithShift

  15. Pitch • Definition • Pitch is also known as fundamental frequency, which is equal to the no. of fundamental period within a second. The unit used here is Hertz (Hz). • Unit • More commonly, pitch is in terms of semitone, which can be converted from pitch in Hertz: Piano roll via HTML5

  16. Pitch Computation (I) • Pitch of tuning forks (code)

  17. Pitch Computation (II) • Pitch of speech(code)

  18. Pitch Perception • Age-related hearing loss • As one grows old, the audible frequency bandwidth is getting narrower • Mosquito ringtone • Low to high, high to low • Applications • Frequencies vs. ages 21k 17.4k 15k 12k 8k

  19. Tones in Mandarin Chinese • 5401 characters, each character is at least associated with a base syllable and a tone • 411 base syllables, and most syllables have 4 tones, so we have 1501 tonal syllables • Tone is characterized by the pitch curves: • Tone 1: high-high • Tone 2: low-high • Tone 3: high-low-high • Tone 4: high-low • Some examples of tones: • 1234:三民主義、優柔寡斷 • 3333:勇猛果敢 (tone sandhi) • ?????:美麗大教堂、滷蛋有夠鹹(Taiwanese)

  20. Timbre • Timbre is represented by • Waveform within a fundamental period • Frame-based energy distribution over frequencies • Power spectrum (over a single frame) • Spectrogram (over many frames) • Frame-based MFCC (mel-frequency cepstral coefficients)

  21. Timbre/Pitch Demo:Real-time Spectrogram • Simulink model for real-time display of spectrogram • dspstfft_audio (Before MATLAB R2011a) • dspstfft_audioInput (R2012a or later) Spectrum: Spectrogram:

More Related