1 / 95

Transformation-Based Learning

Transformation-Based Learning. Advanced Statistical Methods in NLP Ling 572 March 1, 2012. Roadmap. Transformation-based (Error-driven) learning Basic TBL framework Design decisions TBL Part-of-Speech Tagging Rule templates Effectiveness HW #9. Transformation-Based Learning: Overview.

deacon
Download Presentation

Transformation-Based Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Transformation-BasedLearning Advanced Statistical Methods in NLP Ling 572 March 1, 2012

  2. Roadmap • Transformation-based (Error-driven) learning • Basic TBL framework • Design decisions • TBL Part-of-Speech Tagging • Rule templates • Effectiveness • HW #9

  3. Transformation-Based Learning:Overview • Developed by Eric Brill (~1992)

  4. Transformation-Based Learning:Overview • Developed by Eric Brill (~1992) • Basic approach: • Apply initial labeling • Perform cascade of transformations to reduce error

  5. Transformation-Based Learning:Overview • Developed by Eric Brill (~1992) • Basic approach: • Apply initial labeling • Perform cascade of transformations to reduce error • Applications: • Classification • Sequence labeling • Part-of-speech tagging, dialogue act tagging, chunking, handwriting segmentation, referring expression generation • Structure extraction: parsing

  6. TBL Architecture

  7. Key Process • Goal: Correct errors made by initial labeling

  8. Key Process • Goal: Correct errors made by initial labeling • Approach: transformations

  9. Key Process • Goal: Correct errors made by initial labeling • Approach: transformations • Two components:

  10. Key Process • Goal: Correct errors made by initial labeling • Approach: transformations • Two components: • Rewrite rule: • e.g., MD  NN

  11. Key Process • Goal: Correct errors made by initial labeling • Approach: transformations • Two components: • Rewrite rule: • e.g., MD  NN • Triggering environment: • e.g., prevT=DT

  12. Key Process • Goal: Correct errors made by initial labeling • Approach: transformations • Two components: • Rewrite rule: • e.g., MD  NN • Triggering environment: • e.g., prevT=DT • Example transformation: • MD  NN if prevT=DT • The/DT can/NN rusted/VB

  13. Key Process • Goal: Correct errors made by initial labeling • Approach: transformations • Two components: • Rewrite rule: • e.g., MD  NN • Triggering environment: • e.g., prevT=DT • Example transformation: • MD  NN if prevT=DT • The/DT can/MD rusted/VB  The/DT can/NN rusted/VB

  14. TBL: Training • Apply initial labeling procedure

  15. TBL: Training • Apply initial labeling procedure • Consider all possible transformations, select transformation with best score on training data

  16. TBL: Training • Apply initial labeling procedure • Consider all possible transformations, select transformation with best score on training data • Add rule to end of rule cascade

  17. TBL: Training • Apply initial labeling procedure • Consider all possible transformations, select transformation with best score on training data • Add rule to end of rule cascade • Iterate over steps 2,3 until no more improvement

  18. Error Reduction

  19. TBL: Testing • For each test instance:

  20. TBL: Testing • For each test instance: • Apply initial labeling

  21. TBL: Testing • For each test instance: • Apply initial labeling • Apply, in sequence, all matching transformations

  22. Core Design Decisions

  23. Core Design Decisions • Choose initial-state annotator • Many possibilities:

  24. Core Design Decisions • Choose initial-state annotator • Many possibilities: • Simple heuristics, complex learned classifiers

  25. Core Design Decisions • Choose initial-state annotator • Many possibilities: • Simple heuristics, complex learned classifiers • Specify legal transformations:

  26. Core Design Decisions • Choose initial-state annotator • Many possibilities: • Simple heuristics, complex learned classifiers • Specify legal transformations: • ‘Transformation templates’: instantiate based on data

  27. Core Design Decisions • Choose initial-state annotator • Many possibilities: • Simple heuristics, complex learned classifiers • Specify legal transformations: • ‘Transformation templates’: instantiate based on data • Can be simple and localor complex and long-distance

  28. Core Design Decisions • Choose initial-state annotator • Many possibilities: • Simple heuristics, complex learned classifiers • Specify legal transformations: • ‘Transformation templates’: instantiate based on data • Can be simple and localor complex and long-distance • Define evaluation function for selecting transform’n

  29. Core Design Decisions • Choose initial-state annotator • Many possibilities: • Simple heuristics, complex learned classifiers • Specify legal transformations: • ‘Transformation templates’: instantiate based on data • Can be simple and localor complex and long-distance • Define evaluation function for selecting transform’n • Evaluation measures:

  30. Core Design Decisions • Choose initial-state annotator • Many possibilities: • Simple heuristics, complex learned classifiers • Specify legal transformations: • ‘Transformation templates’: instantiate based on data • Can be simple and localor complex and long-distance • Define evaluation function for selecting transform’n • Evaluation measures: accuracy, f-measure, parse score • Stopping criterion:

  31. Core Design Decisions • Choose initial-state annotator • Many possibilities: • Simple heuristics, complex learned classifiers • Specify legal transformations: • ‘Transformation templates’: instantiate based on data • Can be simple and localor complex and long-distance • Define evaluation function for selecting transform’n • Evaluation measures: accuracy, f-measure, parse score • Stopping criterion: no improvement, threshold,..

  32. More Design Decisions • Issue: • Rule application can affect downstream application

  33. More Design Decisions • Issue: • Rule application can affect downstream application • Approach: • Constraints on transformation application:

  34. More Design Decisions • Issue: • Rule application can affect downstream application • Approach: • Constraints on transformation application: • Specify order of transformation application

  35. More Design Decisions • Issue: • Rule application can affect downstream application • Approach: • Constraints on transformation application: • Specify order of transformation application • Decide if transformations applied immediately or after whole corpus is checked for triggering environments

  36. Ordering Example • Example sequence: A A A A A

  37. Ordering Example • Example sequence: A A A A A • Transformation: if prevT==A, AB

  38. Ordering Example • Example sequence: A A A A A • Transformation: if prevT==A, AB • If check whole corpus for contexts before application:

  39. Ordering Example • Example sequence: A A A A A • Transformation: if prevT==A, AB • If check whole corpus for contexts before application: • A B B B B • If apply immediately, left-to-right:

  40. Ordering Example • Example sequence: A A A A A • Transformation: if prevT==A, AB • If check whole corpus for contexts before application: • A B B B B • If apply immediately, left-to-right: • A B A B A • If apply immediately, right-to-left:

  41. Ordering Example • Example sequence: A A A A A • Transformation: if prevT==A, AB • If check whole corpus for contexts before application: • A B B B B • If apply immediately, left-to-right: • A B A B A • If apply immediately, right-to-left: • A B B B B

  42. Comparisons • Decision trees: • Similar in

  43. Comparisons • Decision trees: • Similar in applying a cascade of tests for classification • Similar in interpretability • TBL:

  44. Comparisons • Decision trees: • Similar in applying a cascade of tests for classification • Similar in interpretability • TBL: • More general: Can convert any DT to TBL

  45. Comparisons • Decision trees: • Similar in applying a cascade of tests for classification • Similar in interpretability • TBL: • More general: Can convert any DT to TBL • Selection criterion

  46. Comparisons • Decision trees: • Similar in applying a cascade of tests for classification • Similar in interpretability • TBL: • More general: Can convert any DT to TBL • Selection criterion: more focused: accuracy vsInfoGain • More powerful

  47. Comparisons • Decision trees: • Similar in applying a cascade of tests for classification • Similar in interpretability • TBL: • More general: Can convert any DT to TBL • Selection criterion: more focused: accuracy vsInfoGain • More powerful: Can perform arbitrary output transform • Can refine arbitrary existing labeling • Multipass processing • More expensive:

  48. Comparisons • Decision trees: • Similar in applying a cascade of tests for classification • Similar in interpretability • TBL: • More general: Can convert any DT to TBL • Selection criterion: more focused: accuracy vsInfoGain • More powerful: Can perform arbitrary output transform • Can refine arbitrary existing labeling • Multipass processing • More expensive: • Transformations tested/applied to all training data

  49. Case Study: POS Tagging

  50. TBL Part-of-Speech Tagging • Design decisions:

More Related