1 / 28

Speech tools

Speech tools. Jean-Philippe Goldman 03.03.2004. Two questions. What kind of data ? Which task ?. What kind of data ?. Speech content (noise, multivoice,…) Data File Sound/Transcription/PitchCurve Sampling/Quantization 16k 12k 8k 4k 8bit

thuy
Download Presentation

Speech tools

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Speech tools Jean-Philippe Goldman 03.03.2004

  2. Two questions • What kind of data ? • Which task ?

  3. What kind of data ? • Speech content (noise, multivoice,…) • Data File • Sound/Transcription/PitchCurve • Sampling/Quantization 16k 12k 8k 4k 8bit • Size 16k16bit,256kbps  1.9Mo/mn  115Mo/h • Format • Sound: wav, wma, mp3, ogg, aiff, aifc, au, vox, raw, sd, CSL, Ogg/Vorbis, NIST/Sphere • Transcription: HTK, TIMIT, TextGrid, Phondat • Number of files

  4. Which task ? • Visualization and Edition: • Record, Play, edit, mix, add effects • Analysis: • spectral, pitch • Speech manipulation: • Filtering, mixing, adding effects, prosodic manipulation • Annotation: • segmentation, labeling • Scripting: • Batch, communication with outside • Plotting

  5. Examples of tasks • build stimuli for an experiment (i.e. cross-splicing) • manage a speech database for a TTS engine • create a prosodic database • analyze speech corpus from experiment recordings • verify/correct an automatic segmentation

  6. Two questions • What kind of data ? • Which task ? Two rules • there is no unique tool to do everything • there are plenty of ways to do one thing

  7. Visualization/Edition Analysis Speech manipulation Annotation Scripting Plotting Supported format Platform/installation Evolution/community Accessibility Price Tool features

  8. Softwares • Goldwave (audio editor) • Esps Xwaves (routines + visual.) • Praat (speech analysis) • Wavesurfer (speech editor) • Transcriber (annotation tool) • Matlab (general purpose soft) • OGI speech tools (routines + app. dev.) • …winpitch, pitchworks, phonedit, cooledit…..

  9. Goldwave • self-defined as “top rated, professional digital audio editor”

  10. Goldwave • pros : edition (good gestion of memory for big files), many FX, noise reduction, real-time spectrum and VU meters, various formats, batch conversion, chain effects, easy interface • cons: nothing for speech (pitch, formant), windows only, no scripting • Good for file edition not for speech

  11. Esps - Waves • Developed by Entropic + AT&T. Now public • Comp.speech FAQ says: • Esps: comprehensive set of speech analysis/processing tools • Waves is a graphical front-end for speech processing (waveforms, spectrograms, pitch) includes a signal labeling utility

  12. Esps – waves • pros: powerful, designed for big files, • cons: UNIX only (free BSD), not standard formats, requires programming skills, development has stopped

  13. Praat • Developed by P.Boersma and D.Weenink at the Institute of Phonetic Sciences, University of Amsterdam • general purpose speech tool : edition, segmentation and labeling, prosodic manipulation

  14. Praat • pros: designed for speech analysis (not only sound edition or spectrogram visualization), nice GUI, scripting, active development and community, prosodic manipulation • cons: limited scripting language, native format of transcription and pitch files

  15. WaveSurfer • Open Source tool for sound visualization and manipulation • speech/sound analysis and sound annotation/transcription • platform for more advanced/specialized applications: extending WaveSurfer with new custom plug-ins or embedding WaveSurfer visualization components in other applications • Requires SnackToolKit

  16. Transcriber • Authors: C. Barras, E. Geoffrois • Relies on Snack (Tcl/tk) • Good for annotation • Nice, simple GUI • No speech analysis

  17. Matlab (Mathworks) • Math. environment • Signal processing toolbox : filter-design, spectral analysis, waveform generation, linear prediction • voicebox (2002) mike.brookes@ic.ac.uk • pitch determination algorithm (2002) Xuejing Sun sunxj@northwestern.edu • colea speech editor (1998) Philip Loizou loizou@utdallas.edu Univ of Texas-Dallas

  18. Matlab (Mathworks) • pros: open, powerful, scripting, excellent plotting • cons: poor speech community, standards, not designed for big files

  19. OGI speech tools/CSLU Toolkit • development started in 1992 in C on Unix, at Center for Spoken Language Understanding (CSLU) at OGI • Includes : • An X windows display tool (LYRE) display, edit speech signal, spectrograms, phoneme labels, and other information • a set of C library routines (LIBNSPEECH), utilities for converting file formats, filtering, Neural Network training, vector-quantizer, database utility to automate speech database related enquiries • a set of PERL Scripts which have been used mainly to automate the use of the OGI Speech Tools. • MAN Pages • RAD rapid application development • points of entry: Package(C), script(tcl), GUI(tk) levels • free for research use

  20. Summary = yes but requires some dev.

  21. Expect to do conversions • Sound files • goldwave (win) • sox (unix) • Transcription files • scripts to convert text-formatted label files

  22. Links • www.goldwave.com • www.speech.kth.se/software/#esps • www.praat.org • www.speech.kth.se/software/#wavesurfer • www.cse.ogi.edu/toolkit • www.mathworks.com (Matlab) • www.lpl.univ-aix.fr/~sqlab/ (phonedit) • www.sciconrd.com/pworks.htm (PitchWorks) • www.winpitch.com (WinPitch) • www.adobe.com (CoolEdit > Audition)

More Related