1 / 34

From Sounds to Language

From Sounds to Language. Lecture 2 Spoken Language Processing Prof. Andrew Rosenberg. Linguistic sounds. How does a sound wave become language? Sounds are continuous wave forms. Linguistic units are categorical.

hubert
Download Presentation

From Sounds to Language

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. From Sounds to Language Lecture 2 Spoken Language Processing Prof. Andrew Rosenberg

  2. Linguistic sounds • How does a sound wave become language? • Sounds are continuous wave forms. • Linguistic units are categorical. • How is the human perceptual system able to categorize and combine linguistic sounds into language?

  3. Studying Speech • Who studies speech? • Linguists (phoneticians, phonologists, forensic linguists) • Speech Engineers • Speech recognition • Speech synthesis • etc. • Speech Pathologists • Language Instructors • Singers • Marketing experts

  4. Marketing experts?

  5. Studying speech • Major questions in studying speech. • What is the sound inventory of a language? • Which variations are linguistically relevant? • R/L in Asian Languages • P/Phin English • How are speech sounds produced? • What sounds are shared by two languages, and which are not? • How do sounds vary in context? • “Green banana” vs. “Greem banana”

  6. Representing speech sounds • Why are representations important? • translation between sounds and words • ASR and TTS • Learning pronunciation • Having a shared vocabulary to discuss language. • How should we represent speech sounds? • Orthography? • Special symbols? • Abstract classes based on sound and/or articulatory similarities

  7. Using orthography to represent sounsd • A single orthographic letter is realized in many different ways (in English) • b comb, tomb, bomb • c court, center, chess • oo food, good, blood • s reason, sunrise, shy, collision

  8. Using orthography to represent sounsd • A single sound can be written in many different ways (in English) • [i] sea, see, scene, receive, thief, miss • [s] cereal, same, miss • [u] true, few, choose, lieu, do • [ay] lie, prime, pry, buy, • How is orthography looking as a choice in English?

  9. Phonetic Symbol Sets • International Phonetic Alphabet (IPA) • Single (unique) character for each sound • Represents all sounds of the world’s languages, but is large, and requires a special (non-ascii) font. • ARPAbet, TIMIT, etc. • Multiple characters for each sound • Language specific. A new symbol set is required for each language.

  10. Exercise: Write your full name in English orthography and in ARPAbet.

  11. Sound categories • Phone: Basic speech sound of a language • A minimal sound difference between two words • too vs. zoo • Not every sound made by a human speaker is phonetic • Sniffs, laughs, coughs, breaths… • Phoneme: Class of speech sounds • Phoneme may include several phones • /t/ in top, stop, little, butter, winter • Allophone: the set of phonetic variants that comprise a phoneme. • {[t], [ɾ], …}

  12. Speech Production • The articulatory organs • General Process: • Air is expelled from the lungs through the windpipe (trachea) leaving via the mouth (and nose) • Air passes through the trachea through the larynx which contains the vocal folds – the space between them is the glottis. • When vocal folds vibrate, voiced sounds are produced, otherwise, voiceless (e.g. [f] vs [v])

  13. Vocal Fold Vibration Slow motion video of normal vocal folds

  14. Articulators • “Why did Ken set the net on the soggy deck?” • Queens University ATR Labs X-ray Film Databasehttp://psyc.queensu.ca/~munhallk/05_database.htm

  15. Vocal Organs

  16. Recording Articulatory Data • X-Ray Microbeam Database • Track motion of small gold pellets on the tongue, jaw, lips and soft pallate • Electroglottography • Run a high freq current through the glottal area of a speaker. • There is lower resistance when the vocal folds are closed. • Electromagnetic articulography (EMMA) • 3 transmitters on a helmet allow for triangulation of 5-15 sensor positions

  17. Classes of Sounds • Consonants and Vowels • Consonants: • Restricted or blocked airflow (e.g. [s]) • Voiced or unvoiced • Vowels • Unrestricted airflow • voiced • Semi vowels (approximants): [w], [y]

  18. Consonants: Place of Articulation • What is the point of maximum air restriction? • Labial: bilabial [b], [p]; labiodental [v], [f] • Dental: [], [] thief vs. them • Alveolar: [t], [d], [s], [z] • Palatal: [], [t] shrimp vs. chimp • Velar: [k], [g] • Glottal: [?] glottal stop

  19. Consonants: Place of Articulation • What is the point of maximum air restriction? • Approximant: [w], [y] • 2 articulators come close but don’t restrict much • Somewhere between vowels and consonants • lateral: [l] • Tap or flap: [ ] e.g. butter

  20. alveolar post-alveolar/palatal dental velar uvular labial pharyngeal laryngeal/glottal Places of Articulation http://www.chass.utoronto.ca/~danhall/phonetics/sammy.html

  21. Consonants: Manner of articulation • How is the airflow restricted • Stop (or plosive): [p], [t], [g], … • Airflow is completely blocked (closure) and released (release) • Glottal stop, e.g. before word-initial vowels in English after a pause. “three even” • Nasal: air is released through the nose [m], [ng] • Frivative: [s], [z], [f] air is forced through a narrow channel, leading to turbulent airflow • Affricates: [t] begin as stops, but the release is frivative

  22. Articulation map MANNER OF ARTICULATION VOICING:

  23. Vowels • All voiced • Vowel height • How high is the tongue? High or low? • Where is its highest point? Front or back? • How rounded are the lips? • mono- [eh] vs. dipthong [ey] • 1 vowel sound vs. two

  24. HIGH iy uw ix ux ih uh oy ow ey ax FRONT BACK ao aw eh ah ay ae aa LOW American English Vowel Space

  25. Compare to vowel spaces in other languages • British English • Indian English • Swedish • Spanish • Mandarin Chinese • Japanese

  26. [iy] vs [uw] – “key” vs “coo” (From a lecture given by Rochelle Newman)

  27. [ae]vs [aa] – “cat” vs. “cot” (From a lecture given by Rochelle Newman)

  28. [p] [ix] [t] [ih] [sh] [ax] [p] [ae] [t] [s] [iy] [s] [ae] [l] [iy] [p] [ix] [t] [ih] Acoustic Landmarks “Patricia and Patsy and Sally”

  29. Coarticulation • The same phone can be produced differently depending on phonetic context. • Articulations overlap as articulators move in different timing patterns to to produce consecutive dounsounds • Eight vs. Eighth • Articulation moves forward • Met vs. Men • Vowel becomes nasalized • Green Banana • or “greem” banana?

  30. Articulator mistiming • “Probably”is canonically [p r aa b ax b l iy] • [p r aa b iy] • [p r aw l uh] • [p r ah b iy] • [p r aa l iy] • “Sense” is canonically [s eh n s] • [s eh n t s] • [s ih t s]

  31. IPA Consonants

  32. IPA Vowels

  33. Representations for Sounds • With ways to represent sounds (IPA, Arpabet, etc.) we can classify and manipulate these units. • Automatic Speech Recognition • Speech synthesis • Speech pathology • Language ID • Speaker ID • But…how do we recognize these different sounds automatically from sound data? • Acoustic analysis (digital signal processing)

  34. Next Class • Overview of Spoken Dialog Systems • Readings: J&M 24.1, 24.2

More Related