A quick walk through phonetic databases

A quick walk through phonetic databases • Read English • TIMIT • Boston University Radio News • Spontaneous English • Switchboard ICSI transcriptions • Buckeye Corpus (VIC)

TIMIT • Read phonetically balanced sentences • Good coverage of different phonetic environments • Does not exhibit more radical reductions, dysfluencies seen in spontaneous speech • Transcribers started from forced alignments, realigned • Roughly 5 hours of speech • 630 speakers, 8 dialects, 10 sentences apiece • Uses ARPAbet symbols • Separate stop/closure symbols • Symbol for epenthetic stop • Cost: $100 for non-1993 LDC members

TIMIT

BU Radio Corpus • Radio announcers reading news • 4 male, 3 female; reading in both “non-studio” and “studio” voices • Originally intended for speech synthesis work • Marked with prosody in addition to phonetics • Marked with ARPAbet (similar to TIMIT) • > 7 hours of speech • Cost: $400 for non-1996/1997 LDC members

BU Radio Corpus

Switchboard ICSI Transcriptions • Spontaneous speech, many dialect regions • Transcribed “segmented turns,” some of which may be cutoffs, from 2-party conversations • 4 hours of speech transcribed • 2 stages: • Initial 1 hour phonetically transcribed • Hours 2-4 phonetic markers, syllable boundaries -- back aligned with phonetic markers • Similar phoneset to TIMIT • No separate closure/release • Voiced hesitations (pn/pv) • Cost: possibly free, possibly $2k for non-1993/7

Switchboard ICSI Transcriptions

VIC (Buckeye) Corpus • Spontaneous interview speech • Age, gender balanced • All speakers from Ohio • Currently in transcription • NIH grant involving Keith, me, and Mark Pitt • 10 hours completed, 30 hours total • Based on ARPAbet with a few additions • Nasalized vowels, glottal stop replacing /t/,… • Cost: free (to us) -- might need to work out licensing but shouldn’t be an issue.

VIC (Buckeye) Corpus

Evaluating with Corpora • Clear thing to do is to start with TIMIT • Facilitates comparison with other things • However, we should really try to insert spontaneous data into research ASAP • Maybe move to some combination of TIMIT/SWB/VIC? • Only talked about (American) English • Other languages in year 4? • Chin has done some work in Mandarin? • CASS corpus: phonetically transcribed, but available?

A quick walk through phonetic databases

A quick walk through phonetic databases

Presentation Transcript

A walk through the Universe

A Walk Through the Wiki

Take A Walk Through History

Walk Through Great Ocean Walk

A Walk Through Time

A Walk Through the Bylaws

A Walk Through

A walk through media lane

A walk through the process

Conducting a Walk-Through

A Random Walk Through Astrometry

A Walk Through California’s History

Quick Connect Walk Through

A walk through the Universe

A Walk through American History

A Walk Through Women’s Suffrage

Textbook Walk Through

A WALK THROUGH WORD PROCESSING ....

A Quick Walk Through BSIMM7 Software Security Framework

A walk through the woods

Our Products- A quick walk through

A Walk Through Man