- By
**Mercy** - Follow User

- 197 Views
- Uploaded on

Download Presentation
## Course Summary

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Outline

- Problem description
- General approach
- ML algorithms
- Important concepts
- Assignments
- What’s next?

Two types of problems

- Classification problem
- Sequence Labeling problem
- In both cases:
- A predefined set of labels: C = {c1, c2, …cn}
- Training data: { (xi, yi) }, where yi2 C, and yi is known or unknown.
- Test data

NLP tasks

- Classification problems:
- Document classification
- Spam detection
- Sentiment analysis
- …
- Sequence labeling problems:
- POS tagging
- Word segmentation
- Sentence segmentation
- NE detection
- Parsing
- IGT detection
- …

Step 1: Preprocessing

- Converting the NLP task to a classification or sequence labeling problem
- Creating the attribute-value table:
- Define feature templates
- Instantiate feature templates and select features
- Decide what kind of feature values to use (e.g., binarizing features or not)
- Converting a multi-class problem to a binary problem (optional)

Feature selection

- Dimensionality reduction
- Feature selection
- Wrapping methods
- Filtering methods:
- Mutual info, 2, Information gain, ….
- Feature extraction
- Term clustering:
- Latent semantic indexing (LSI)

Multiclass Binary

- One-vs-all
- All-pairs
- Error-correcting Output Codes (ECOC)

Step 2: Training and decoding

- Choose a ML learner
- Train and test on development set, with different settings of non-model parameters
- Choose the best setting for the development set
- Run the learner on the test data with the best setting

Step 3: Post-processing

- Label sequence the output we want
- System combination
- Voting: majority voting, weighted voting
- More sophisticated models

Main ideas

- kNN and Ricchio: finding the nearest neighbors / prototypes
- DT and DL: finding the right group
- NB, MaxEnt: calculating P(y | x)
- Bagging: Reducing the instability
- Boosting: Forming a committee
- TBL: Improving the current guess

ML learners

- Modeling
- Training
- Testing (a.k.a. decoding)

Modeling

- NB: assuming features are conditionally independent.
- MaxEnt:

Training

- kNN: no training
- Rocchio: calculate prototypes
- DT: build a decision tree
- Choose a feature and then split data
- DL: build a decision list:
- Choose a decision rule and then spit data
- TBL: build a transformation list by
- Choose a transformation and then update the current label field

Training (cont)

- NB: calculate P(ci) and P(fj | ci) by simple counting.
- MaxEnt: calculate the weights of feature functions by iteration.
- Bagging: create bootstrap samples and learn base classifiers.
- Boosting: learn base classifiers and their weights.

Testing

- kNN: calculate distances between x and xi, find the closest neighbors.
- Rocchio: calculate distances between x and prototypes.
- DT: traverse the tree
- DL: find the first matched decision rule.
- TBL: apply transformations one by one.

Testing (cont)

- NB: calc
- MaxEnt: calc
- Bagging: run the base classifiers and choose the class with highest votes.
- Boosting: run the base classifiers and calc the weighted sum.

Sequence labeling problems

- With classification algorithms:
- Having features that refer to previous tags
- Using beam search to find good sequences
- With sequence labeling algorithms:
- HMM
- TBL
- MEMM
- CRF
- …

Unsupervised algorithms

- MLE
- EM:
- General algorithm: E-step, M-step
- EM for PM models
- Forward-backward for HMM
- Inside-outside for PCFG
- IBM models for MT

Concepts

- Attribute-value table
- Feature templates vs. features
- Weights:
- Feature weights
- Classifier weights
- Instance weights
- Feature values

Concepts (cont)

- Maximum entropy vs. Maximum likelihood
- Maximize likelihood vs. minimize training error
- Training time vs. test time
- Training error vs. test error
- Greedy algorithm vs. iterative approach

Concepts (cont)

- Local optima vs. global optima
- Beam search vs. Viterbi algorithm
- Sample vs. resample
- Model parameters vs. non-model parameters

Assignments

- Read code:
- NB: binary features?
- DT: difference between DT and C4.5
- Boosting: AdaBoost and AdaBoostM2
- MaxEnt: binary features?
- Write code:
- Info2Vectors
- BinVectors
- 2
- Complete two projects

Projects

- Steps:
- Preprocessing
- Training and testing
- Postprocssing
- Two projects:
- Project 1: Document classification
- Project 2: IGT detection

Project 1: Document classification

- A typical classification problem
- Data are prepared already
- Feature template: word appeared in the doc
- Feature value: word frequency

Project 2: IGT detection

- Can be framed as a sequence labeling problem
- Preprocessing: Define label set
- Postprocessing: Tag sequence spans
- Sequence labeling problem using classification algorithm with beam search
- To use classification classifiers:
- Preprocessing:
- Define features
- Choose feature values
- …

Project 2 (cont)

- Preprocessing:
- Define label set
- Define feature templates
- Decide on feature values
- Training and decoding
- Write beam search
- Postprocessing
- Convert label sequence spans

Project 2 (cont)

- Presentation
- Final report
- A typical conference paper:
- Introduction
- Previous work
- Methodology
- Experiments
- Discussion
- Conclusion

Using Mallet

- Difficulties:
- Java
- A large package
- Benefits:
- Java
- A large package
- Many learning algorithms: comparing the implementation with “standard” algorithms

Bugs in Mallet?

- In Hw9, include a new section:
- Bugs
- Complaints
- Things you like about Mallet

Course summary

- 9 weeks: 18 sessions
- 2 kinds of problems
- 9 supervised algorithms
- 1 semi-supervised algorithm
- 1 unsupervised algorithm
- 4 related issues: feature selection, multiclass binary, system combination, beam search
- 2 projects
- 1 well-known package
- 9 assignments, including 1 presentation and 1 final report
- N papers

What’s the next?

- Learn more about the algorithms covered in class.
- Learn new algorithms:
- SVM, CRF, regression algorithms, graphical models, …
- Try new tasks:
- Parsing, spam filtering, reference resolution, …

Misc

- Hw7: due tomorrow 11pm
- Hw8: due Thursday 11pm
- Hw9: due 3/13 11pm
- Presentation: No more than 15+5 minutes

What must be included in the presentation?

- Label set
- Feature templates
- Effect of beam search
- 3+ ways to improve the system and results on dev data (test_data/)
- Best system: results on dev data and the setting
- Results on test data (more_test_data/)

Grades, etc.

- 9 assignments + class participation
- Hw1-Hw6:
- Total: 740
- Max: 696.56
- Min: 346.52
- Ave: 548.74
- Median: 559.08

Download Presentation

Connecting to Server..