1 / 28

Parsing with Context-Free Grammars for ASR

Parsing with Context-Free Grammars for ASR. Julia Hirschberg CS 4706 Slides with contributions from Owen Rambow , Kathy McKeown , Dan Jurafsky and James Martin. What is Syntax?. Structure of language How words are arranged together and related to one another

joelle
Download Presentation

Parsing with Context-Free Grammars for ASR

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parsing withContext-Free Grammars for ASR Julia Hirschberg CS 4706 Slides with contributions from Owen Rambow, Kathy McKeown, Dan Jurafsky and James Martin

  2. What is Syntax? • Structure of language • How words are arranged together and related to one another • Goal of syntactic analysis: relate surface form (what someone says or writes) to underlying structure, to support semantic analysis (what the utterance or text means) • Syntactic representation: typically a tree structure

  3. Structure in Strings • A set of words, or, a lexicon: the a small nice big very boy girl sees likes • Some `good’ (grammatical) sentences: • the boy likes a girl • the small girl likes the big girl • a very small nice boy sees a very nice boy • Some bad (ungrammatical) sentences: • *the boy the girl • *small boy likes nice girl • Can we find a way of distinguishing between the two kinds of sequences? • Can we identify similarities among grammatical subsequences?

  4. One Version of Constituent Structure • Lexicon: the a small nice big very boy girl sees likes • Grammatical sentences: • (the) boy (likes a girl) • (the small) girl (likes the big girl) • (a very small nice) boy (sees a very nice boy) • Ungrammatical sentences: • *(the) boy (the girl) • *(small) boy (likes the nice girl)

  5. Another Constituency Hypothesis • Lexicon: the a small nice big very boy girl sees likes • Grammatical sentences: • (the boy) likes (a girl) • (the small girl) likes (the big girl) • (a very small nice boy) sees (a very nice boy) • Ungrammatical sentences: • *(the boy)(the girl) • *(small boy) likes (the nice girl) • Better: fewer types of constituents (blue and red are of same type)

  6. Even More Structures • Lexicon: the a small nice big very boy girl sees likes • Grammatical sentences: • ((the) boy) likes ((a) girl) • ((the) (small) girl) likes ((the) (big) girl) • ((a) ((very) small) (nice) boy) sees ((a) ((very) nice) girl) • Ungrammatical sentences: • *((the) boy)((the) girl) • *((small) boy) likes ((the) (nice) girl)

  7. From Substrings to Trees likes boy girl the a (((the) boy) likes ((a) girl))

  8. How do we Label the Nodes? • ( ((the) boy) likes ((a) girl) ) • Choose constituents so each one has one non-bracketed word: the head • Group words by distribution of constituents they head (POS) • Noun (N), verb (V), adjective (Adj), adverb (Adv), determiner (Det) • Category of constituent: XP, where X is POS • NP, S, AdjP, AdvP, DetP

  9. Types of Nodes S nonterminal symbols = constituents NP likes NP boy girl DetP DetP the a terminal symbols = words Phrase-structure tree (((the/Det) boy/N) likes/V ((a/Det) girl/N))

  10. Determining Part-of-Speech A blue seat/a child seat: noun or adjective? • Syntax: • a blue seat a child seat • a very blue seat *a very child seat • this seat is blue *this seat is child • Morphology: • bluer *childer • blue and child are not the same POS • blueis Adj, child is Noun

  11. Determining Part-of-Speech • Preposition or particle? • A he threw out the garbage • B he threw the garbage out the door • A he threw the garbage out • B *he threw the garbage the door out • The two out are not same POS • A is particle, B is Preposition

  12. Constituency • Some Noun phrases (NPs) • A red dog on a blue tree • A blue dog on a red tree • Some big dogs and some little dogs • A dog • I • Big dogs, little dogs, red dogs, blue dogs, yellow dogs, green dogs, black dogs, and white dogs • How do we know these form a constituent?

  13. NP Constituency • NPs can all appear before a verb: • Some big dogs and some little dogs are going around in cars… • Big dogs, little dogs, red dogs, blue dogs, yellow dogs, green dogs, black dogs, and white dogs are all at a dog party! • I do not • But individual words can’t always appear before verbs: • *little are going… • *blue are… • *and are • Must be able to state generalizations like: • Noun phrases occur before verbs

  14. PP Constituency • Preposing and postposing: • Under a tree is a yellow dog. • A yellow dog is under a tree. • But not: • *Under, is a yellow dog a tree. • *Under a is a yellow dog tree. • Prepositional phrases notable for ambiguity in attachment • I saw a man on a hill with a telescope.

  15. Context-Free Grammars • Defined in formal language theory • Terminals: e.g. cat • Non-terminal symbols: e.g. NP, VP • Start symbol: e.g. S • Rewriting rules: e.g. S  NP VP • Start with start symbol, rewrite using rules, done when only terminals left

  16. A Fragment of English S  NP VP VP  V PP NP  DetP N N  cat | mat V  is PP  Prep NP Prep  on DetP  the Input: the cat is on the mat

  17. Derivations in a CFG S S  NP VP VP  V PP NP  DetP N N  cat | mat V  is PP  Prep NP Prep  on DetP  the S

  18. Derivations in a CFG NP VP S  NP VP VP  V PP NP  DetP N N  cat | mat V  is PP  Prep NP Prep  on DetP  the S NP VP

  19. Derivations in a CFG DetP N VP S  NP VP VP  V PP NP  DetP N N  cat | mat V  is PP  Prep NP Prep  on DetP  the S NP VP DetP N

  20. Derivations in a CFG the cat VP S  NP VP VP  V PP NP  DetP N N  cat | mat V  is PP  Prep NP Prep  on DetP  the S NP VP DetP N the cat

  21. Derivations in a CFG the cat V PP S  NP VP VP  V PP NP  DetP N N  cat | mat V  is PP  Prep NP Prep  on DetP  the S NP VP DetP N V PP the cat

  22. Derivations in a CFG the cat is Prep NP S  NP VP VP  V PP NP  DetP N N  cat | mat V  is PP  Prep NP Prep  on DetP  the S NP VP DetP N V PP the cat is Prep NP

  23. Derivations in a CFG the cat is on Det N S  NP VP VP  V PP NP  DetP N N  cat | mat V  is PP  Prep NP Prep  on DetP  the S NP VP DetP N V PP the cat is Prep NP on DetP N

  24. Derivations in a CFG the cat is on the mat S  NP VP VP  V PP NP  DetP N N  cat | mat V  is PP  Prep NP Prep  on DetP  the S NP VP DetP N V PP the cat is Prep NP on DetP N the mat

  25. S  NP VP S  VP VP  V PP VP  V NP VP  V NP  DetP NP NP  N NP NP  N PP  Prep NP N  cat | mat | food | bowl | Mary V  is | likes | sits Prep  on | in | under DetP  the | a A More Complicated Fragment of English Mary likes the cat bowl. The cat ate the tasty food. Hello. Nice talking to you.

  26. Pocket Sphinx Grammar Format • Variables go in angle brackets, e.g. <city> • Terminals must appear in your pronunciation dictionary (case sensitive) • X Y is concatenation -- e.g., I WANT • (X | Y) means X or Y -- e.g., (WANT|NEED) • Square brackets mean optional -- e.g., [ON] FRIDAY • * means that the expansion may be spoken zero or more times-- e.g. <digit>* • + means one or more times-- e.g. <digit>+

  27. Example • <city> = BOSTON | NEWYORK | WASHINGTON | BALTIMORE; • <time> = MORNING | EVENING; • <day> = FRIDAY | MONDAY; • public <query> = (((WHAT TRAINS LEAVE) | (WHAT TIME CAN I TRAVEL) | (IS THERE A TRAIN)) (FROM|TO) <city> [(FROM|TO) <city>] ON <day> [<time>]); Hello. No. I want to go on Tuesday. When does the train leave?

  28. Next Class • Language modeling for large vocabulary applications: Ngrams

More Related