1 / 49

Sequence Classification: Chunking

Sequence Classification: Chunking . Shallow Processing Techniques for NLP Ling570 November 28, 2011. Chunking. Roadmap. Chunking Definition Motivation Challenges Approach. What is Chunking?. Form of partial (shallow) parsing. What is Chunking?. Form of partial (shallow) parsing

khoi
Download Presentation

Sequence Classification: Chunking

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sequence Classification:Chunking Shallow Processing Techniques for NLP Ling570 November 28, 2011

  2. Chunking

  3. Roadmap • Chunking • Definition • Motivation • Challenges • Approach

  4. What is Chunking? • Form of partial (shallow) parsing

  5. What is Chunking? • Form of partial (shallow) parsing • Extracts major syntactic units, but not full parse trees

  6. What is Chunking? • Form of partial (shallow) parsing • Extracts major syntactic units, but not full parse trees • Task: identify and classify • Flat, non-overlapping segments of a sentence

  7. What is Chunking? • Form of partial (shallow) parsing • Extracts major syntactic units, but not full parse trees • Task: identify and classify • Flat, non-overlapping segments of a sentence • Basic non-recursive phrases

  8. What is Chunking? • Form of partial (shallow) parsing • Extracts major syntactic units, but not full parse trees • Task: identify and classify • Flat, non-overlapping segments of a sentence • Basic non-recursive phrases • Correspond to major POS • May ignore some categories; i.e. base NP chunking

  9. What is Chunking? • Form of partial (shallow) parsing • Extracts major syntactic units, but not full parse trees • Task: identify and classify • Flat, non-overlapping segments of a sentence • Basic non-recursive phrases • Correspond to major POS • May ignore some categories; i.e. base NP chunking • Create simple bracketing • [NPThe morning flight][PPfrom][NPDenver][Vphas arrived]

  10. What is Chunking? • Form of partial (shallow) parsing • Extracts major syntactic units, but not full parse trees • Task: identify and classify • Flat, non-overlapping segments of a sentence • Basic non-recursive phrases • Correspond to major POS • May ignore some categories; i.e. base NP chunking • Create simple bracketing • [NPThe morning flight][PPfrom][NPDenver][Vphas arrived] • [NPThe morning flight]from [NPDenver]has arrived

  11. Why Chunking? • Used when full parse unnecessary

  12. Why Chunking? • Used when full parse unnecessary • Or infeasible or impossible (when?)

  13. Why Chunking? • Used when full parse unnecessary • Or infeasible or impossible (when?) • Extraction of subcategorization frames • Identify verb arguments • e.g. VP NP • VP NP NP • VP NP to NP

  14. Why Chunking? • Used when full parse unnecessary • Or infeasible or impossible (when?) • Extraction of subcategorization frames • Identify verb arguments • e.g. VP NP • VP NP NP • VP NP to NP • Information extraction: who did what to whom

  15. Why Chunking? • Used when full parse unnecessary • Or infeasible or impossible (when?) • Extraction of subcategorization frames • Identify verb arguments • e.g. VP NP • VP NP NP • VP NP to NP • Information extraction: who did what to whom • Summarization: Base information, remove mods

  16. Why Chunking? • Used when full parse unnecessary • Or infeasible or impossible (when?) • Extraction of subcategorization frames • Identify verb arguments • e.g. VP NP • VP NP NP • VP NP to NP • Information extraction: who did what to whom • Summarization: Base information, remove mods • Information retrieval: Restrict indexing to base NPs

  17. Processing Example • Tokenization: The morning flight from Denver has arrived

  18. Processing Example • Tokenization: The morning flight from Denver has arrived • POS tagging: DT JJ N PREP NNP AUX V

  19. Processing Example • Tokenization: The morning flight from Denver has arrived • POS tagging: DT JJ N PREP NNP AUX V • Chunking: NP PP NP VP

  20. Processing Example • Tokenization: The morning flight from Denver has arrived • POS tagging: DT JJ N PREP NNP AUX V • Chunking: NP PP NP VP • Extraction: NP NP VP • etc

  21. Approaches • Finite-state Approaches • Grammatical rules in FSTs • Cascade to produce more complex structure

  22. Approaches • Finite-state Approaches • Grammatical rules in FSTs • Cascade to produce more complex structure • Machine Learning • Similar to POS tagging

  23. Finite-State Rule-Based Chunking • Hand-crafted rules model phrases • Typically application-specific

  24. Finite-State Rule-Based Chunking • Hand-crafted rules model phrases • Typically application-specific • Left-to-right longest match (Abney 1996) • Start at beginning of sentence • Find longest matching rule

  25. Finite-State Rule-Based Chunking • Hand-crafted rules model phrases • Typically application-specific • Left-to-right longest match (Abney 1996) • Start at beginning of sentence • Find longest matching rule • Greedy approach, not guaranteed optimal

  26. Finite-State Rule-Based Chunking • Chunk rules: • Cannot contain recursion • NP -> Det Nominal:

  27. Finite-State Rule-Based Chunking • Chunk rules: • Cannot contain recursion • NP -> Det Nominal: Okay • Nominal -> Nominal PP:

  28. Finite-State Rule-Based Chunking • Chunk rules: • Cannot contain recursion • NP -> Det Nominal: Okay • Nominal -> Nominal PP: Not okay • Examples: • NP  (Det) Noun* Noun • NP  Proper-Noun • VP  Verb • VP  Aux Verb

  29. Finite-State Rule-Based Chunking • Chunk rules: • Cannot contain recursion • NP -> Det Nominal: Okay • Nominal -> Nominal PP: Not okay • Examples: • NP  (Det) Noun* Noun • NP  Proper-Noun • VP  Verb • VP  Aux Verb • Consider: Time flies like an arrow • Is this what we want?

  30. Cascading FSTs • Richer partial parsing • Pass output of FST to next FST

  31. Cascading FSTs • Richer partial parsing • Pass output of FST to next FST • Approach: • First stage: Base phrase chunking • Next stage: Larger constituents (e.g. PPs, VPs) • Highest stage: Sentences

  32. Example

  33. Chunking by Classification • Model chunking as task similar to POS tagging • Instance:

  34. Chunking by Classification • Model chunking as task similar to POS tagging • Instance: tokens • Labels: • Simultaneously encode segmentation & identification

  35. Chunking by Classification • Model chunking as task similar to POS tagging • Instance: tokens • Labels: • Simultaneously encode segmentation & identification • IOB (or BIO tagging) (also BIOE or BIOSE) • Segment: B(eginning), I (nternal), O(utside)

  36. Chunking by Classification • Model chunking as task similar to POS tagging • Instance: tokens • Labels: • Simultaneously encode segmentation & identification • IOB (or BIO tagging) (also BIOE or BIOSE) • Segment: B(eginning), I (nternal), O(utside) • Identity: Phrase category: NP, VP, PP, etc.

  37. Chunking by Classification • Model chunking as task similar to POS tagging • Instance: tokens • Labels: • Simultaneously encode segmentation & identification • IOB (or BIO tagging) (also BIOE or BIOSE) • Segment: B(eginning), I (nternal), O(utside) • Identity: Phrase category: NP, VP, PP, etc. • The morning flight from Denver has arrived • NP-B NP-I NP-I PP-B NP-B VP-B VP-I

  38. Chunking by Classification • Model chunking as task similar to POS tagging • Instance: tokens • Labels: • Simultaneously encode segmentation & identification • IOB (or BIO tagging) (also BIOE or BIOSE) • Segment: B(eginning), I (nternal), O(utside) • Identity: Phrase category: NP, VP, PP, etc. • The morning flight from Denver has arrived • NP-B NP-I NP-I PP-B NP-B VP-B VP-I • NP-B NP-I NP-I NP-B

  39. Features for Chunking • What are good features?

  40. Features for Chunking • What are good features? • Preceding tags • for 2 preceding words

  41. Features for Chunking • What are good features? • Preceding tags • for 2 preceding words • Words • for 2 preceding, current, 2 following

  42. Features for Chunking • What are good features? • Preceding tags • for 2 preceding words • Words • for 2 preceding, current, 2 following • Parts of speech • for 2 preceding, current, 2 following

  43. Features for Chunking • What are good features? • Preceding tags • for 2 preceding words • Words • for 2 preceding, current, 2 following • Parts of speech • for 2 preceding, current, 2 following • Vector includes those features + true label

  44. Chunking as Classification • Example

  45. Evaluation • System: output of automatic tagging • Gold Standard: true tags • Typically extracted from parsed treebank • Precision: # correct chunks/# system chunks • Recall: # correct chunks/# gold chunks • F-measure: • F1 balances precision & recall

  46. State-of-the-Art • Base NP chunking: 0.96

  47. State-of-the-Art • Base NP chunking: 0.96 • Complex phrases: Learning: 0.92-0.94 • Most learners achieve similar results • Rule-based: 0.85-0.92

  48. State-of-the-Art • Base NP chunking: 0.96 • Complex phrases: Learning: 0.92-0.94 • Most learners achieve similar results • Rule-based: 0.85-0.92 • Limiting factors:

  49. State-of-the-Art • Base NP chunking: 0.96 • Complex phrases: Learning: 0.92-0.94 • Most learners achieve similar results • Rule-based: 0.85-0.92 • Limiting factors: • POS tagging accuracy • Inconsistent labeling (parse tree extraction) • Conjunctions • Late departures and arrivals are common in winter • Late departures and cancellations are common in winter

More Related