course summary l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Course Summary PowerPoint Presentation
Download Presentation
Course Summary

Loading in 2 Seconds...

play fullscreen
1 / 40

Course Summary - PowerPoint PPT Presentation


  • 183 Views
  • Uploaded on

Course Summary. LING 572 Fei Xia 03/06/07. Outline. Problem description General approach ML algorithms Important concepts Assignments What’s next? . Problem descriptions. Two types of problems. Classification problem Sequence Labeling problem In both cases:

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Course Summary' - Mercy


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
course summary

Course Summary

LING 572

Fei Xia

03/06/07

outline
Outline
  • Problem description
  • General approach
  • ML algorithms
  • Important concepts
  • Assignments
  • What’s next?
two types of problems
Two types of problems
  • Classification problem
  • Sequence Labeling problem
  • In both cases:
    • A predefined set of labels: C = {c1, c2, …cn}
    • Training data: { (xi, yi) }, where yi2 C, and yi is known or unknown.
    • Test data
nlp tasks
NLP tasks
  • Classification problems:
    • Document classification
    • Spam detection
    • Sentiment analysis
  • Sequence labeling problems:
    • POS tagging
    • Word segmentation
    • Sentence segmentation
    • NE detection
    • Parsing
    • IGT detection
step 1 preprocessing
Step 1: Preprocessing
  • Converting the NLP task to a classification or sequence labeling problem
  • Creating the attribute-value table:
    • Define feature templates
    • Instantiate feature templates and select features
    • Decide what kind of feature values to use (e.g., binarizing features or not)
    • Converting a multi-class problem to a binary problem (optional)
feature selection
Feature selection
  • Dimensionality reduction
    • Feature selection
      • Wrapping methods
      • Filtering methods:
        • Mutual info, 2, Information gain, ….
    • Feature extraction
      • Term clustering:
      • Latent semantic indexing (LSI)
multiclass binary
Multiclass  Binary
  • One-vs-all
  • All-pairs
  • Error-correcting Output Codes (ECOC)
step 2 training and decoding
Step 2: Training and decoding
  • Choose a ML learner
  • Train and test on development set, with different settings of non-model parameters
  • Choose the best setting for the development set
  • Run the learner on the test data with the best setting
step 3 post processing
Step 3: Post-processing
  • Label sequence  the output we want
  • System combination
    • Voting: majority voting, weighted voting
    • More sophisticated models
main ideas
Main ideas
  • kNN and Ricchio: finding the nearest neighbors / prototypes
  • DT and DL: finding the right group
  • NB, MaxEnt: calculating P(y | x)
  • Bagging: Reducing the instability
  • Boosting: Forming a committee
  • TBL: Improving the current guess
ml learners
ML learners
  • Modeling
  • Training
  • Testing (a.k.a. decoding)
modeling
Modeling
  • NB: assuming features are conditionally independent.
  • MaxEnt:
training
Training
  • kNN: no training
  • Rocchio: calculate prototypes
  • DT: build a decision tree
    • Choose a feature and then split data
  • DL: build a decision list:
    • Choose a decision rule and then spit data
  • TBL: build a transformation list by
    • Choose a transformation and then update the current label field
training cont
Training (cont)
  • NB: calculate P(ci) and P(fj | ci) by simple counting.
  • MaxEnt: calculate the weights of feature functions by iteration.
  • Bagging: create bootstrap samples and learn base classifiers.
  • Boosting: learn base classifiers and their weights.
testing
Testing
  • kNN: calculate distances between x and xi, find the closest neighbors.
  • Rocchio: calculate distances between x and prototypes.
  • DT: traverse the tree
  • DL: find the first matched decision rule.
  • TBL: apply transformations one by one.
testing cont
Testing (cont)
  • NB: calc
  • MaxEnt: calc
  • Bagging: run the base classifiers and choose the class with highest votes.
  • Boosting: run the base classifiers and calc the weighted sum.
sequence labeling problems
Sequence labeling problems
  • With classification algorithms:
    • Having features that refer to previous tags
    • Using beam search to find good sequences
  • With sequence labeling algorithms:
    • HMM
    • TBL
    • MEMM
    • CRF
semi supervised algorithms
Semi-supervised algorithms
  • Self-training
  • Co-training

 Adding some unlabeled data to the labeled data

unsupervised algorithms
Unsupervised algorithms
  • MLE
  • EM:
    • General algorithm: E-step, M-step
    • EM for PM models
      • Forward-backward for HMM
      • Inside-outside for PCFG
      • IBM models for MT
concepts
Concepts
  • Attribute-value table
  • Feature templates vs. features
  • Weights:
    • Feature weights
    • Classifier weights
    • Instance weights
    • Feature values
concepts cont
Concepts (cont)
  • Maximum entropy vs. Maximum likelihood
  • Maximize likelihood vs. minimize training error
  • Training time vs. test time
  • Training error vs. test error
  • Greedy algorithm vs. iterative approach
concepts cont26
Concepts (cont)
  • Local optima vs. global optima
  • Beam search vs. Viterbi algorithm
  • Sample vs. resample
  • Model parameters vs. non-model parameters
assignments28
Assignments
  • Read code:
    • NB: binary features?
    • DT: difference between DT and C4.5
    • Boosting: AdaBoost and AdaBoostM2
    • MaxEnt: binary features?
  • Write code:
    • Info2Vectors
    • BinVectors
    • 2
  • Complete two projects
projects
Projects
  • Steps:
    • Preprocessing
    • Training and testing
    • Postprocssing
  • Two projects:
    • Project 1: Document classification
    • Project 2: IGT detection
project 1 document classification
Project 1: Document classification
  • A typical classification problem
  • Data are prepared already
    • Feature template: word appeared in the doc
    • Feature value: word frequency
project 2 igt detection
Project 2: IGT detection
  • Can be framed as a sequence labeling problem
    • Preprocessing: Define label set
    • Postprocessing: Tag sequence  spans
  • Sequence labeling problem  using classification algorithm with beam search
  • To use classification classifiers:
    • Preprocessing:
      • Define features
      • Choose feature values
project 2 cont
Project 2 (cont)
  • Preprocessing:
    • Define label set
    • Define feature templates
    • Decide on feature values
  • Training and decoding
    • Write beam search
  • Postprocessing
    • Convert label sequence  spans
project 2 cont33
Project 2 (cont)
  • Presentation
  • Final report
  • A typical conference paper:
    • Introduction
    • Previous work
    • Methodology
    • Experiments
    • Discussion
    • Conclusion
using mallet
Using Mallet
  • Difficulties:
    • Java
    • A large package
  • Benefits:
    • Java
    • A large package
    • Many learning algorithms: comparing the implementation with “standard” algorithms
bugs in mallet
Bugs in Mallet?
  • In Hw9, include a new section:
    • Bugs
    • Complaints
    • Things you like about Mallet
course summary36
Course summary
  • 9 weeks: 18 sessions
  • 2 kinds of problems
  • 9 supervised algorithms
  • 1 semi-supervised algorithm
  • 1 unsupervised algorithm
  • 4 related issues: feature selection, multiclass  binary, system combination, beam search
  • 2 projects
  • 1 well-known package
  • 9 assignments, including 1 presentation and 1 final report
  • N papers
what s the next
What’s the next?
  • Learn more about the algorithms covered in class.
  • Learn new algorithms:
    • SVM, CRF, regression algorithms, graphical models, …
  • Try new tasks:
    • Parsing, spam filtering, reference resolution, …
slide38
Misc
  • Hw7: due tomorrow 11pm
  • Hw8: due Thursday 11pm
  • Hw9: due 3/13 11pm
  • Presentation: No more than 15+5 minutes
what must be included in the presentation
What must be included in the presentation?
  • Label set
  • Feature templates
  • Effect of beam search
  • 3+ ways to improve the system and results on dev data (test_data/)
  • Best system: results on dev data and the setting
  • Results on test data (more_test_data/)
grades etc
Grades, etc.
  • 9 assignments + class participation
  • Hw1-Hw6:
    • Total: 740
    • Max: 696.56
    • Min: 346.52
    • Ave: 548.74
    • Median: 559.08