1 / 44

音频信号处理(基础篇)

音频信号处理(基础篇). 0 开胃酒. 参考文献. 1) 本领域的学科发展 2) 本领域的技术发展. 参考文献. 网 络. 哪些素质(能力)是重要的?. 一个项目的 研发 过程. 有什么. 英语. “ 物理”概念 思路. 是什么. 数学. 为什么. 工具. 怎么做. 1 入手:实验的原材料. Wav 文件. 例子: keep friends with.wav. 格式区. 数据区. 偏移地址 字节数 数据类型 内 容 00H 4 char "RIFF" 标志

eliza
Download Presentation

音频信号处理(基础篇)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 音频信号处理(基础篇)

  2. 0 开胃酒 参考文献 1) 本领域的学科发展 2) 本领域的技术发展

  3. 参考文献 网 络

  4. 哪些素质(能力)是重要的? 一个项目的研发过程 有什么 英语 “物理”概念 思路 是什么 数学 为什么 工具 怎么做

  5. 1 入手:实验的原材料 Wav文件 例子:keep friends with.wav

  6. 格式区 数据区

  7. 偏移地址 字节数 数据类型 内 容 00H 4 char "RIFF"标志 04H 4 long 文件长度,'File length'-8, so, is 'data length'+0x24 (File length = data length + 0x2c) 08H 4 char "WAVE"标志 0CH 4 char "fmt"标志 10H 4   过渡字节(不定) 14H 2 int 格式类别(10H为PCM形式的声音数据) 16H 2 int 通道数,单声道为1,双声道为2 18H 4 long 采样率(每秒样本数) 1CH 4 long 波形音频数据传送速率,其值为通道数×每秒数据 位数×每样本的数据位数/8。播放软件利用此值可 以估计缓冲区的大小。

  8. 20H 2 int 数据块的调整数(按字节算的),其值为通道数× 每样本的数据位值/8。播放软件需要一次处理多 个该值大小的字节数据,以便将其值用于缓冲区的 调整。 22H 2   每样本的数据位数,表示每个声道中各个样本的数 据位数。如果有多个声道,对每个声道而言,样本 大小都一样。 24H 4 char 数据标记符"data" 28H 4 long 语音数据的长度

  9. typedef struct { char Riff[4]; unsigned long sizeOfFile; char WAVEfmt[8]; unsigned long sizeOfFmt; short int wFormatTag; short int nChannels; unsigned long nSamplesPerSec; unsigned long navgBytesPerSec; short int nBlockAlign; unsigned short nBitPerSample; char Cdata[4]; unsigned long sizeOfData; } HeadOfWave;

  10. 几个说明。 * 文件长度和数据长度 * 关键量:采样率/声道数/量化模式/量化bit * navgBytesPerSec和nBlockAlign的计算 * 程序举例 和 说明

  11. 2 基本概念 采样率 量化bit

  12. 2.1 采样率 48k/44k/32k/22k/16k/11k/8kHz 两条线: 44k/22k/11k 32k/16k/8k 为什么是这些值?

  13. 2.2 音频信号的带宽 文件 keep_friend_with.wav (采样率44kHz) 代表频率,32是22kHz 7kHz

  14. 22kHz 4kHz

  15. 文件 keep_friend_with_8k.wav (采样率8kHz) 4kHz

  16. 上述文件很特殊。采集环境很好。 一般认为: * 语音(speech) 300-3400kHz,采样率8kHz * 宽带语音(wide-band speech) 带宽7kHz(50-7k),采样率16kHz * 音频(audio) 带宽20kHz(20-20k),采样率44.1kHz,48kHz

  17. 2.2 音频信号的带宽 采样率为什么是那些值? Nyquist Sampling Theorem 为什么44.1kHz? 20kHz ->(Nyquist) 40kHz->(Rolloff from passband to stopband ) 44kHz -> 44.1kHz?

  18. At the time the choice was made, only recorders capable of storing such high rates were VCRs. NTSC: 490 lines/frame, 3 samples/line, 30 frames/s = 44100 samples/s PAL: 588 lines/frame, 3 samples/line, 25 frames/s = 44100 samples/s Prof. Brian L. Evans Dept. of Electrical and Computer Engineering The University of Texas at Austin

  19. Listen to the sounds… keep_friends_with(44k_mono).wav keep_friends_with(22k_mono).wav keep_friends_with(16k_mono).wav keep_friends_with(11k_mono).wav keep_friends_with(8k_mono).wav

  20. 对语音信号,8kHz/11kHz 采样率是一个效果; 16kHz采样率以上是一个效果。 所以,对语音信号而言,分为voice/wideband speech就可以了。

  21. 2.2 量化bits 线性量化/非线性量化 量化信噪比:6b dB。 6.02b + 1.76 复读机规范:声音从磁带上复读到芯片上,再用耳机听芯片上的声音时有用信号和噪声之间的幅度差,标准规定≥34dB。

  22. Listen to the sounds… keep_friends_with(16k_mono).wav keep_friends_with(16k_mono)_8b.wav 8bit线性量化的文件,明显带了背景噪声。 从经验出发,可接受的量化bit,应该是?

  23. 入手:实验的原材料 16kHz or 8kHz采样率的语音文件; 44.1kHz采样率的音乐文件; 16bit or 14bit 线性量化;

  24. 3 我常用的音频处理的工具 VC6.0, using c; matlab cooledit

  25. Math. environment Signal processing toolbox : filter-design, spectral analysis, waveform generation, linear prediction voicebox Matlab (Mathworks)

  26. pros: open, powerful, scripting, excellent plotting cons: poor speech community, standards, not designed for big files Matlab (Mathworks)

  27. 其它的语音分析工具? • Goldwave (audio editor) • Esps Xwaves (routines + visual.) • Praat (speech analysis) • Wavesurfer (speech editor) • Transcriber (annotation tool) • OGI speech tools (routines + app. dev.) • …winpitch, pitchworks, phonedit…..

  28. self-defined as “top rated, professional digital audio editor” Goldwave

  29. pros : edition (good gestion of memory for big files), many FX, noise reduction, real-time spectrum and VU meters, various formats, batch conversion, chain effects, easy interface cons: nothing for speech (pitch, formant), windows only, no scripting Good for file edition not for speech Goldwave

  30. Developed by Entropic + AT&T. Now public Comp.speech FAQ says: Esps: comprehensive set of speech analysis/processing tools Waves is a graphical front-end for speech processing (waveforms, spectrograms, pitch) includes a signal labeling utility Esps - Waves

  31. pros: powerful, designed for big files, cons: UNIX only (free BSD), not standard formats, requires programming skills, development has stopped Esps – waves

  32. Developed by P.Boersma and D.Weenink at the Institute of Phonetic Sciences, University of Amsterdam general purpose speech tool : edition, segmentation and labeling, prosodic manipulation Praat

  33. pros: designed for speech analysis (not only sound edition or spectrogram visualization), nice GUI, scripting, active development and community, prosodic manipulation cons: limited scripting language, native format of transcription and pitch files Praat

  34. Open Source tool for sound visualization and manipulation speech/sound analysis and sound annotation/transcription platform for more advanced/specialized applications: extending WaveSurfer with new custom plug-ins or embedding WaveSurfer visualization components in other applications Requires SnackToolKit WaveSurfer

  35. Authors: C. Barras, E. Geoffrois Relies on Snack (Tcl/tk) Good for annotation Nice, simple GUI No speech analysis Transcriber

  36. development started in 1992 in C on Unix, at Center for Spoken Language Understanding (CSLU) at OGI Includes : An X windows display tool (LYRE) display, edit speech signal, spectrograms, phoneme labels, and other information a set of C library routines (LIBNSPEECH), utilities for converting file formats, filtering, Neural Network training, vector-quantizer, database utility to automate speech database related enquiries a set of PERL Scripts which have been used mainly to automate the use of the OGI Speech Tools. MAN Pages RAD rapid application development points of entry: Package(C), script(tcl), GUI(tk) levels free for research use OGI speech tools/CSLU Toolkit

  37. Summary = yes but requires some dev.

More Related