1 / 46

Text Learning

Text Learning. Tom M. Mitchell Aladdin Workshop Carnegie Mellon University January 2003. 1. CoTraining learning from labeled and unlabeled data. Redundantly Sufficient Features. my advisor. Professor Faloutsos. Redundantly Sufficient Features. my advisor. Professor Faloutsos.

matty
Download Presentation

Text Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Text Learning Tom M. Mitchell Aladdin Workshop Carnegie Mellon University January 2003

  2. 1. CoTraining learning from labeled and unlabeled data

  3. Redundantly Sufficient Features my advisor Professor Faloutsos

  4. Redundantly Sufficient Features my advisor Professor Faloutsos

  5. Redundantly Sufficient Features

  6. Redundantly Sufficient Features my advisor Professor Faloutsos

  7. CoTraining Setting • If • x1, x2 conditionally independent given y • f is PAC learnable from noisy labeled data • Then • f is PAC learnable from weak initial classifier plus unlabeled data

  8. Co-Training Rote Learner pages hyperlinks My advisor + - -

  9. pages hyperlinks My advisor + - - Co-Training Rote Learner + + - - - -

  10. pages hyperlinks My advisor + - - Co-Training Rote Learner + + + + - - - - - - -

  11. pages hyperlinks My advisor + - - Co-Training Rote Learner + + + + + + - - - - - - - - -

  12. pages hyperlinks My advisor + - - Co-Training Rote Learner + + + + + + + + - - - - - - - - - -

  13. What if CoTraining Assumption Not Perfectly Satisfied? + + - +

  14. + + - + What if CoTraining Assumption Not Perfectly Satisfied?

  15. + + - + What if CoTraining Assumption Not Perfectly Satisfied? • Idea: Want classifiers that produce a maximally consistent labeling of the data • If learning is an optimization problem, what function should we optimize?

  16. Error on labeled examples What Objective Function?

  17. Error on labeled examples Disagreement over unlabeled What Objective Function?

  18. Error on labeled examples Disagreement over unlabeled What Objective Function? Misfit to estimated class priors

  19. What Function Approximators?

  20. What Function Approximators? • Same fn form as Naïve Bayes, Max Entropy • Use gradient descent to simultaneously learn g1 and g2, directly minimizing E = E1 + E2 + E3 + E4 • No word independence assumption, use both labeled and unlabeled data

  21. Gradient CoTraining

  22. Classifying Jobs for FlipDog X1: job title X2: job description

  23. Gradient CoTraining Classifying FlipDog job descriptions: SysAdmin vs. WebProgrammer Final Accuracy Labeled data alone: 86% CoTraining: 96%

  24. .85 .89 * * Gradient CoTraining Classifying Upper Case sequences as Person Names 2300 labeled 5000 unlabeled 25 labeled 5000 unlabeled Using labeled data only .76 .87 Cotraining Cotraining without fitting class priors (E4) * .73 * sensitive to weights of error terms E3 and E4

  25. CoTraining Summary • Key is getting the right objective function • Class priors is an important term • Can min-cut algorithms accommodate this? • And minimizing it… • Gradient descent local minima problems • Graph partitioning possible?

  26. The Problem/Opportunity • Must train classifier to be website-independent, but many sites exhibit website-specific regularities Question • How can program learn website-specific regularities for millions of sites, without human labeling data?

  27. Learn Local Regularities for Page Classification

  28. Learn Local Regularities for Page Classification 1. Label site using global classifier

  29. Learn Local Regularities for Page Classification 1. Label site using global classifier (cont educ page)

  30. Learn Local Regularities for Page Classification 1. Label site using global classifier 2. Learn local classifiers

  31. Learn Local Regularities for Page Classification 1. Label site using global classifier 2. Learn local classifiers, CECourse(x) :- under(x,http://….CEd.html) linkto(x,http://…music.html) 1 < inDegree (x) < 4 globalConfidence(x) > 0.3 CEd.html Music.html

  32. Learn Local Regularities for Page Classification 1. Label site using global classifier 2. Learn local classifiers, 3. Apply local classifier, to modify global labels CEd.html Music.html

  33. Learn Local Regularities for Page Classification 1. Label site using global classifier 2. Learn local classifier 3. Apply local classifier, to modify global labels CEd.html Music.html

  34. Results of Local Learning: Cont.Education Course Page • Learning global classifier only: • precision .81, recall .80 • Learning global classifier plus site-specific classifiers for 20 local sites: • precision .82, recall .90

  35. Learning Site-Specific Regularities: Example 2 • Extracting “Course-Title” from web pages

  36. Local/Global Learning Algorithm • Train global course title extractor (word based) • For each new university site: • Apply global title extractor • For each page containing extracted titles • Learn page-specific rules for extracting titles, based on page layout structure • Apply learned rules to refine initial labeling

  37. X X

  38. Local/Global Learning Summary • Approach: • Learn global extractor/classifier using content features • Learn local extractor/classifier using layout features • Design restricted hypothesis language for local, to accommodate sparse training data • Algorithm to process a new site: • Apply global extractor/classifier to label site • Train local extractor/classifier on this data • Apply local extractor/classifier to refine labels

  39. Other Local Learning Approaches • Rule covering algorithms: each rule a local model • But require supervised labeled data for each locality • Shrinkage-based techniques, eg., for learning hospital-independent and hospital-specific models for medical outcomes • Again, requires labeled data for each hospital • This is different – no labeled data for new sites

  40. When/Why does this work?? • Local and global models use independent, redundantly sufficient features • Local models learned within low-dimension hypothesis language • Related to co-training!

  41. Other Uses? + Global and website-specific information extractors + Global and program-specific TV segment classifiers? + Global and environment-specific robot perception? • Global and speaker-specific speech recognition? • Global and hospital-specific medical diagnosis?

  42. Summary • Cotraining: • Classifier learning as minimization problem • Graph partitioning algorithm possible? • Learning site-specific structure: • Important structure involves long-distance relationships • Strong local graph structure regularities are highly useful

More Related