Course Summary

1 / 40

# Course Summary - PowerPoint PPT Presentation

Course Summary. LING 572 Fei Xia 03/06/07. Outline. Problem description General approach ML algorithms Important concepts Assignments What’s next? . Problem descriptions. Two types of problems. Classification problem Sequence Labeling problem In both cases:

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Course Summary' - Mercy

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Course Summary

LING 572

Fei Xia

03/06/07

Outline
• Problem description
• General approach
• ML algorithms
• Important concepts
• Assignments
• What’s next?

### Problem descriptions

Two types of problems
• Classification problem
• Sequence Labeling problem
• In both cases:
• A predefined set of labels: C = {c1, c2, …cn}
• Training data: { (xi, yi) }, where yi2 C, and yi is known or unknown.
• Test data
• Classification problems:
• Document classification
• Spam detection
• Sentiment analysis
• Sequence labeling problems:
• POS tagging
• Word segmentation
• Sentence segmentation
• NE detection
• Parsing
• IGT detection

### General approach

Step 1: Preprocessing
• Converting the NLP task to a classification or sequence labeling problem
• Creating the attribute-value table:
• Define feature templates
• Instantiate feature templates and select features
• Decide what kind of feature values to use (e.g., binarizing features or not)
• Converting a multi-class problem to a binary problem (optional)
Feature selection
• Dimensionality reduction
• Feature selection
• Wrapping methods
• Filtering methods:
• Mutual info, 2, Information gain, ….
• Feature extraction
• Term clustering:
• Latent semantic indexing (LSI)
Multiclass  Binary
• One-vs-all
• All-pairs
• Error-correcting Output Codes (ECOC)
Step 2: Training and decoding
• Choose a ML learner
• Train and test on development set, with different settings of non-model parameters
• Choose the best setting for the development set
• Run the learner on the test data with the best setting
Step 3: Post-processing
• Label sequence  the output we want
• System combination
• Voting: majority voting, weighted voting
• More sophisticated models

### Supervised algorithms

Main ideas
• kNN and Ricchio: finding the nearest neighbors / prototypes
• DT and DL: finding the right group
• NB, MaxEnt: calculating P(y | x)
• Bagging: Reducing the instability
• Boosting: Forming a committee
• TBL: Improving the current guess
ML learners
• Modeling
• Training
• Testing (a.k.a. decoding)
Modeling
• NB: assuming features are conditionally independent.
• MaxEnt:
Training
• kNN: no training
• Rocchio: calculate prototypes
• DT: build a decision tree
• Choose a feature and then split data
• DL: build a decision list:
• Choose a decision rule and then spit data
• TBL: build a transformation list by
• Choose a transformation and then update the current label field
Training (cont)
• NB: calculate P(ci) and P(fj | ci) by simple counting.
• MaxEnt: calculate the weights of feature functions by iteration.
• Bagging: create bootstrap samples and learn base classifiers.
• Boosting: learn base classifiers and their weights.
Testing
• kNN: calculate distances between x and xi, find the closest neighbors.
• Rocchio: calculate distances between x and prototypes.
• DT: traverse the tree
• DL: find the first matched decision rule.
• TBL: apply transformations one by one.
Testing (cont)
• NB: calc
• MaxEnt: calc
• Bagging: run the base classifiers and choose the class with highest votes.
• Boosting: run the base classifiers and calc the weighted sum.
Sequence labeling problems
• With classification algorithms:
• Having features that refer to previous tags
• Using beam search to find good sequences
• With sequence labeling algorithms:
• HMM
• TBL
• MEMM
• CRF
Semi-supervised algorithms
• Self-training
• Co-training

 Adding some unlabeled data to the labeled data

Unsupervised algorithms
• MLE
• EM:
• General algorithm: E-step, M-step
• EM for PM models
• Forward-backward for HMM
• Inside-outside for PCFG
• IBM models for MT

### Important concepts

Concepts
• Attribute-value table
• Feature templates vs. features
• Weights:
• Feature weights
• Classifier weights
• Instance weights
• Feature values
Concepts (cont)
• Maximum entropy vs. Maximum likelihood
• Maximize likelihood vs. minimize training error
• Training time vs. test time
• Training error vs. test error
• Greedy algorithm vs. iterative approach
Concepts (cont)
• Local optima vs. global optima
• Beam search vs. Viterbi algorithm
• Sample vs. resample
• Model parameters vs. non-model parameters

### Assignments

Assignments
• NB: binary features?
• DT: difference between DT and C4.5
• MaxEnt: binary features?
• Write code:
• Info2Vectors
• BinVectors
• 2
• Complete two projects
Projects
• Steps:
• Preprocessing
• Training and testing
• Postprocssing
• Two projects:
• Project 1: Document classification
• Project 2: IGT detection
Project 1: Document classification
• A typical classification problem
• Feature template: word appeared in the doc
• Feature value: word frequency
Project 2: IGT detection
• Can be framed as a sequence labeling problem
• Preprocessing: Define label set
• Postprocessing: Tag sequence  spans
• Sequence labeling problem  using classification algorithm with beam search
• To use classification classifiers:
• Preprocessing:
• Define features
• Choose feature values
Project 2 (cont)
• Preprocessing:
• Define label set
• Define feature templates
• Decide on feature values
• Training and decoding
• Write beam search
• Postprocessing
• Convert label sequence  spans
Project 2 (cont)
• Presentation
• Final report
• A typical conference paper:
• Introduction
• Previous work
• Methodology
• Experiments
• Discussion
• Conclusion
Using Mallet
• Difficulties:
• Java
• A large package
• Benefits:
• Java
• A large package
• Many learning algorithms: comparing the implementation with “standard” algorithms
Bugs in Mallet?
• In Hw9, include a new section:
• Bugs
• Complaints
• Things you like about Mallet
Course summary
• 9 weeks: 18 sessions
• 2 kinds of problems
• 9 supervised algorithms
• 1 semi-supervised algorithm
• 1 unsupervised algorithm
• 4 related issues: feature selection, multiclass  binary, system combination, beam search
• 2 projects
• 1 well-known package
• 9 assignments, including 1 presentation and 1 final report
• N papers
What’s the next?
• Learn new algorithms:
• SVM, CRF, regression algorithms, graphical models, …
• Parsing, spam filtering, reference resolution, …
Misc
• Hw7: due tomorrow 11pm
• Hw8: due Thursday 11pm
• Hw9: due 3/13 11pm
• Presentation: No more than 15+5 minutes
What must be included in the presentation?
• Label set
• Feature templates
• Effect of beam search
• 3+ ways to improve the system and results on dev data (test_data/)
• Best system: results on dev data and the setting
• Results on test data (more_test_data/)