1 / 72

Final review

Final review. LING 572 Fei Xia 03/07/06. Misc. Parts 3 and 4 were due at 6am today. Presentation: email me the slides by 6am on 3/9 Final report: email me by 6am on 3/14. Group meetings: 1:30-4:00pm on 3/16. Outline. Main topics Applying to NLP tasks Tricks. Main topics. Main topics.

brygid
Download Presentation

Final review

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Final review LING 572 Fei Xia 03/07/06

  2. Misc • Parts 3 and 4 were due at 6am today. • Presentation: email me the slides by 6am on 3/9 • Final report: email me by 6am on 3/14. • Group meetings: 1:30-4:00pm on 3/16.

  3. Outline • Main topics • Applying to NLP tasks • Tricks

  4. Main topics

  5. Main topics • Supervised learning • Decision tree • Decision list • TBL • MaxEnt • Boosting • Semi-supervised learning • Self-training • Co-training • EM • Co-EM

  6. Main topics (cont) • Unsupervised learning • The EM algorithm • The EM algorithm for PM models • Forward-backward • Inside-outside • IBM models for MT • Others • Two dynamic models: FSA and HMM • Re-sampling: bootstrap • System combination • Bagging

  7. Main topics (cont) • Homework • Hw1: FSA and HMM • Hw2: DT, DL, CNF, DNF, and TBL • Hw3: Boosting • Project: • P1: Trigram (learn to use Carmel, relation between HMM and FSA) • P2: TBL • P3: MaxEnt • P4: Bagging, boosting, system combination, SSL

  8. Supervised learning

  9. A classification problem

  10. Classification and estimation problems • Given • x: input attributes • y: the goal • training data: a set of (x, y) • Predict y given a new x: • y is a discrete variable  classification problem • y is a continuous variable  estimation problem

  11. Five ML methods • Decision tree • Decision list • TBL • Boosting • MaxEnt

  12. Decision tree • Modeling: tree representation • Training: top-down induction, greedy algorithm • Decoding: find the path from root to a leaf node, where the tests along the path are satisfied.

  13. Decision tree (cont) • Main algorithms: ID3, C4.5, CART • Strengths: • Ability to generate understandable rules • Ability to clearly indicate best attributes • Weakness: • Data splitting • Trouble with non-rectangular regions • The instability of top-down induction  bagging

  14. Decision list • Modeling: a list of decision rules • Training: greedy, iterative algorithm • Decoding: find the 1st rule that applies • Each decision is based on a single piece of evidence, in contrast to MaxEnt, boosting, TBL

  15. TBL • Modeling: a list of transformations (similar to decision rules) • Training: • Greedy, iterative algorithm • The concept of current state • Decoding: apply every transformation to the data

  16. TBL (cont) • Strengths: • Minimizing error rate directly • Ability to handle non-classification problem • Dynamic problem: POS tagging • Non-classification problem: parsing • Weaknesses: • Transformations are hard to interpret as they interact with one another • Probabilistic TBL: TBL-DT

  17. Boosting Weighted Sample ML f1 Training Sample ML Weighted Sample f2 f … ML fT

  18. Boosting (cont) • Modeling: combining a set of weak classifiers to produce a powerful committee. • Training: learn one classifier at each iteration • Decoding: use the weighted majority vote of the weak classifiers

  19. Boosting (cont) • Strengths • It comes with a set of theoretical guarantee (e.g., training error, test error). • It only needs to find weak classifiers. • Weaknesses: • It is susceptible to noise. • The actual performance depends on the data and the base learner.

  20. MaxEnt The task: find p* s.t. where If p* exists, it has of the form

  21. MaxEnt (cont) • If p* exists, then where

  22. MaxEnt (cont) • Training: GIS, IIS • Feature selection: • Greedy algorithm • Select one (or more) at a time • In general, MaxEnt achieves good performance on many NLP tasks.

  23. Common issues • Objective function / Quality measure: • DT, DL: e.g., information gain • TBL, Boosting: minimize training errors • MaxEnt: maximize entropy while satisfying constraints

  24. Common issues (cont) • Avoiding overfitting • Use development data • Two strategies: • stop early • post-pruning

  25. Common issues (cont) • Missing attribute values: • Assume a “blank” value • Assign most common value among all “similar” examples in the training data • (DL, DT): Assign a fraction of example to each possible class. • Continuous-valued attributes • Choosing thresholds by checking the training data

  26. Common issues (cont) • Attributes with different costs • DT: Change the quality measure to include the costs • Continuous-valued goal attribute • DT, DL: each “leaf” node is marked with a real value or a linear function • TBL, MaxEnt, Boosting: ??

  27. Comparison of supervised learners

  28. Semi-supervised Learning

  29. Semi-supervised learning • Each learning method makes some assumptions about the problem. • SSL works when those assumptions are satisfied. • SSL could degrade the performance when mistakes reinforce themselves.

  30. SSL (cont) • We have covered four methods: self-training, co-training, EM, co-EM

  31. Co-training • The original paper: (Blum and Mitchell, 1998) • Two “independent” views: split the features into two sets. • Train a classifier on each view. • Each classifier labels data that can be used to train the other classifier. • Extension: • Relax the conditional independence assumptions • Instead of using two views, use two or more classifiers trained on the whole feature set.

  32. Unsupervised learning

  33. Unsupervised learning • EM is a method of estimating parameters in the MLE framework. • It finds a sequence of parameters that improve the likelihood of the training data.

  34. The EM algorithm • Start with initial estimate, θ0 • Repeat until convergence • E-step: calculate • M-step: find

  35. The EM algorithm (cont) • The optimal solution for the M-step exists for many classes of problems.  A number of well-known methods are special cases of EM. • The EM algorithm for PM models • Forward-backward algorithm • Inside-outside algorithm • …

  36. Other topics

  37. FSA and HMM • Two types of HMMs: • State-emission and arc-emission HMMs • They are equivalent • We can convert HMM into WFA • Modeling: Marcov assumption • Training: • Supervised: counting • Unsupervised: forward-backward algorithm • Decoding: Viterbi algorithm

  38. Bootstrap ML f1 ML f2 f ML fB

  39. Bootstrap (cont) • A method of re-sampling: • One original sample  B bootstrap samples • It has a strong mathematical background. • It is a method for estimating standard errors, bias, and so on.

  40. ML1 f1 ML2 f2 f MLB fB System combination

  41. System combination (cont) • Hybridization: combine substructures to produce a new one. • Voting • Naïve Bayes • Switching: choose one of the fi(x) • Similarity switching • Naïve Bayes

  42. Bagging ML f1 ML f2 f ML fB bootstrap + system combination

  43. Bagging (cont) • It is effective for unstable learning methods: • Decision tree • Regression tree • Neural network • It does not help stable learning methods • K-nearest neighbors

  44. Relations

  45. Relations • WFSA and HMM • DL, DT, TBL • EM, EM for PM

  46. Start Finish WFSA and HMM HMM Add a “Start” state and a transition from “Start” to any state in HMM. Add a “Finish” state and a transition from any state in HMM to “Finish”.

  47. DT, CNF, DNF, DT, TBL K-DL k-CNF k-DT k-DNF k-TBL

  48. The EM algorithm The generalized EM The EM algorithm PM Inside-Outside Forward-backward IBM models Gaussian Mix

  49. Solving a NLP problem

  50. Issues • Modeling: represent the problem as a formula and decompose the formula into a function of parameters • Training: estimate model parameters • Decoding: find the best answer given the parameters • Other issues: • Preprocessing • Postprocessing • Evaluation • …

More Related