1 / 21

Characteristics of Speech

Characteristics of Speech. Long-term (sentence level, several seconds) Drastic/irregular changes Short-term (frame level, 20ms or so) Regular periodic changes for voiced sounds Noise-like for unvoiced sounds Hard to recognize without context information. Spectrum in Frequency-Domain.

Download Presentation

Characteristics of Speech

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Characteristics of Speech • Long-term (sentence level, several seconds) • Drastic/irregular changes • Short-term (frame level, 20ms or so) • Regular periodic changes for voiced sounds • Noise-like for unvoiced sounds • Hard to recognize without context information

  2. Spectrum in Frequency-Domain • Three basic characteristics in a spectrum: • Timbre: Spectrum after smoothing • Pitch: Distance between harmonics • Intensity: Magnitude of spectrum Second formant F2 Pitch freq First formant F1 Intensity

  3. Timber Demo: Real-time Spectrogram • Simulink model for real-time display of spectrogram • dspstfft_audio (Before MATLAB R2011a) • dspstfft_audioInput (R2012a or later) Spectrum: Spectrogram:

  4. Audio Feature Extraction & Recog. • Frame blocking • Frame duration of 20 ms • Feature extraction • Volume, pitch, MFCC, LPC, etc • Endpoint detection • Based on volume & ZCR • Recognition • DTW, HMM

  5. Example: Audio Feature Extraction Overlap Zoom in Frame 256 points/frame 84 points overlap 11025/(256-84)=64 feature vectors per second

  6. Three Basic Acoustic Features • Three basic speech features • Volume/Energy/Intensity(音量、能量、強度):Vibration Amplitude • Pitch(音高):Fundamental frequency (which is equal to the reciprocal of the fundamental period) • Timbre(音色):The waveform within a fundamental period • These features are perceived subjectively by humans. However, we can use some mathematics to “emulate” human and capture these features.

  7. Acoustic Feature: Energy • Energy is the square sum of a frame, also known as intensity or volume. • Characteristics: • Usually noise and fricative have low energy. • Energy is influence a lot by microphone setup. • If we take log of square sum, and times 10, we have energy in terms of Decibel(分貝) • Energy is commonly used in endpoint detection. • In embedded system implementation, volume can be computed as the abs. sum of a frame in order to reduce computation.

  8. Acoustic Feature: Zero Crossing Rate • Zero crossing rate (ZCR) • The number of zero crossing in a frame. • Characteristics: • Noise and unvoiced sound have high ZCR. • ZCR is commonly used in endpoint detection, especially in detection the start and end of unvoiced sound. • To distinguish noise/silence from unvoiced sound, usually we add a bias before computing ZCR.

  9. Pitch • Computation • Pitch freq. is the reciprocal of fundamental period. • Pitch in terms of semitone:

  10. 一般聲音的產生與接收 • 基本流程 • 發音體的震動 • 空氣的波動 • 耳膜的振動 • 內耳神經的接收 • 大腦的辨識 • 發聲機制 • 敲擊所引發的自然震動頻率(例:音叉) • 空氣摩擦所引發的共振頻率(例:笛子)

  11. Human Speech Production

  12. The Vocal Tract

  13. Glottal Volume Velocity &Resulting Sound Pressure (Voiced)

  14. Speech Production Glottal Pulses Vocal Tract Speech Signal = + + = (a) Source Spectrum (b) Filter Function (c) Output Energy Spectrum

  15. Acoustical Analysis(speech signal of “七”)

  16. × Speech Production Modeling Pitch Period phonation whispering frication compression vibration Vocal Tract Parameters Impulse Train Generator Time-varying digital filter u(n) s(n) G Noise Generator

  17. Parametric Representation u(n) × s(n) A(z) Model G = gain of excitation u(n) = excitation source (quasi-periodic pulse train or random noise) G Z-Transform Write in A(z)

  18. The Speech Model : A Summary • Voiced/unvoiced classification, • Pitch period for voiced sounds, • The gain parameter, and • The coefficients of the digital filters, {ak}.

  19. Cochlea:耳蝸 Phoneme:音素、音位 Phonics:聲學;聲音基礎教學法(以聲音為基礎進而教拼字的教學法) Phonetics:語音學 Phonology:音系學、語音體系 Prosody:韻律學;作詩法 Syllable:音節 Tone:音調 Alveolar:齒槽音 Silence:靜音 Noise:雜訊 Glottis:聲門 larynx:喉頭 Pharynx:咽頭 Pharyngeal:咽部的,喉音的 Velum:軟顎 Vocal chords:聲帶 Esophagus:食管 Diaphragm:橫隔膜 Trachea:氣管 名詞對照

  20. Hints for Exercises • How to generate a sine wave signal: • Math formula: • MATLAB code: duration=3; f=440; fs=16000; time=(0:duration*fs-1)/fs; y=0.8*sin(2*pi*f*t); plot(time, y); sound(y, fs);

More Related