1 / 44

Learning to Predict

Presenter: Russell Greiner. Learning to Predict. Vision Statement. Helping the world understand data … and make informed decisions. … and make informed decisions . Single decision: determine class label of an instance set of labels of set of pixels, …

Download Presentation

Learning to Predict

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Presenter: Russell Greiner Learning to Predict

  2. Vision Statement Helping the world understand data … and make informed decisions. … and make informed decisions. • Single decision: determine • class label of an instance • set of labels of set of pixels, … • value of a property of an instance, …

  3. Motivation for Training a Predictor • Need to know “label” of an instance,to determine appropriate action • PredictorMed( patient#2 ) =?“treatX is Ok” • Unfortunately, Predictor( . )not known a priori • But many examples of patient, treatX Predictor Ok

  4. Learner Motivation for Training a Predictor • Machine learning provide alg’s for mapping {  patient, treatX  } to Predictor(.)function Temp. Press. Sore Throat … Colour treatX 35 95 Y … Pale No 22 110 N … Clear Ok : : : : 10 87 N … Pale No Predictor treatX Temp Press. Sore- Throat … Color Ok 32 90 N … Pale

  5. Learner Motivation for Training a Predictor • Need to learn (not program it in) when predictor is … • … not known • … not expressible • … changing • … user dependent Temp. Press. Sore Throat … Colour treatX 35 95 Y … Pale No 22 110 N … Clear Ok : : : : 10 87 N … Pale No Predictor treatX Temp Press. Sore- Throat … Color No 32 90 N … Pale

  6. Personnel • PI synergy: • Greiner, Schuurmans, Holte, Sutton, Szepesvari, Goebel • 5 Postdocs • 16 Grad students (5 MSc, 11 PhD) • 5 Supporting technical staff + personnel for Bioinformatics thrust

  7. Partners/Collaborators • 4 UofA CS profs • 1 UofAlberta Math/Stat • Non-UofA collaborators: Google, Yahoo!, Electronic Arts, UofMontreal, UofWaterloo, UofNebraska, NICTA, NRC-IIT,… + Bioinformatics thrust collaborators

  8. Additional Resources • Grants • $225K CFI • $100K MITACS • $100K Google • Hardware • 68 processor, 2TB, Opteron Cluster • 54 processor, dual core, 1.5TB, Opteron Cluster + funds/data for Bioinformatics thrust

  9. Highlights • IJCAI 2005 – Distinguished Paper Prize • UM 2003 – Best Student Paper Prize • WebIC technology is foundation for start-up company • Significant advances in extending SVMs to use Un-supervised/Semi-supervised data, and for structured data + Highlights from Bioinformatics thrust

  10. Temp. Press Sore Throat … Colour treatX 35 95 Y … Pale No 22 110 N … Clear Ok : : : : 10 87 N … Pale No Learner Predictor treatX Temp Press. Sore- Throat … Color No 32 90 N … Pale Learning to Predict: Challenges Simplifying assumptions re: training data • IID / unstructured • Lots of instances • Low dimensions • Complete features • Completely labeled • Balanced data • is sufficient

  11. Segmenting Brain Tumors Learning to Predict: Challenges Simplifying assumptions re: training data • IID / unstructured ? • Lots of instances • Low dimensions • Complete features • Completely labeled • Balanced data • is sufficient Extensions to Conditional Random Fields, …

  12. Learning to Predict: Challenges Simplifying assumptions re training data • IID / unstructured • Lots of instances ? • Low dimensions? • Complete features • Completely labeled • Balanced data • is sufficient N  10’s m  1000’s

  13. Learning to Predict: Challenges Simplifying assumptions re training data • IID / unstructured • Lots of instances ? • Low dimensions? • Complete features • Completely labeled • Balanced data • is sufficient N  20,000 m100 Microarray, SNP Chips, … Dimensionality Reduction … L 2 Model: Component Discovery BiCluster Coding

  14. Learning to Predict: Challenges Simplifying assumptions re training data • IID / unstructured • Lots of instances • Low dimensions • Complete features ? • Completely labeled • Balanced data • is sufficient Budget Learning

  15. Learning to Predict: Challenges Simplifying assumptions re training data • IID / unstructured • Lots of instances • Low dimensions • Complete features • Completely labeled ? • Balanced data • is sufficient SemiSupervised Learning Active Learning

  16. Learning to Predict: Challenges Simplifying assumptions re training data • IID / unstructured • Lots of instances • Low dimensions • Complete features • Completely labeled • Balanced data ? • is sufficient Cost Curves (analysis)

  17. Learning to Predict: Challenges Simplifying assumptions re training data • IID / unstructured • Lots of instances • Low dimensions • Complete features • Completely labeled • Balanced data • is sufficient ? Robust SVM Mixture Using Variance Large Margin Bayes Net Coordinate Classifier …

  18. Projects and Status • Structured Prediction • Random Fields • Parsing • Unsupervised M3N • Dimensional Reduction • (L 2 Model: Component Discovery) • Budgeted Learning • SemiSupervised Learning • large-margin (SVM) • probabilistic (CRF) • graph based transduction • Active Learning • CostCurves • Robust SVM • Coordinated Classifiers • Mixture Using Variance • Large Margin Bayes Net IID / unstructured Lots of instances Low dimensions Complete features Completely labeled Balanced data Beyond simple learners Poster # 26

  19. Technical Details Budgeted Learning

  20. Response Learner Predictor Typical Supervised Learning Person 1 Person 2

  21. Response Learner Predictor ActiveLearning Person 1 Person 2 User is able to PURCHASE labels, at some cost … for which instances??

  22. Response Learner Predictor BudgetedLearning Person 1 Person 2 User is able to PURCHASE values of features, at some cost … but which features for which instances??

  23. Response Learner Predictor BudgetedLearning Person 1 Person 2 User is able to PURCHASE values of features, at some cost … but which features for which instances?? Significantly different from ACTIVE learning: • correlations between feature values

  24. 10 tests ($1/test) Budget =$40 Beta(10,1) Error # features purchased

  25. Budgeted Learning… so far • Defined framework • Ability to purchase individual feature values • Fixed LEARNING / CLASSIFICATION Budget • Theoretical results • NP-hard in general • Standard algorithms not even Approx ! • Empirical Results show … • Avoid Round Robin • Try clever algorithms • Biased Robin • Randomized Single Feature Lookahead [Lizotte,Madani,Greiner: UAI’03], [Madani,Lizotte,Greiner: UAI’04], [Kapoor,Greiner: ECML’05]

  26. Response Learner Classifier Future Work #1 Person 1 Person 2

  27. Future Work #2 • Sample complexity of Budgeted Learning • How many (Ij, Xi)“probes” required to PAC-learn ? • Develop policies withguaranteeson learning performance • Complex cost model…Bundling tests, … • Allow learner to perform more powerful probes • purchase X3 in instance where X7 = 0& Y = 1 • More complex classifiers ?

  28. Response Future Work #3 Learning Generative Model Person 1 Person 2 Goal: Find * = argmax P(D)

  29. MTest MTrain Labels BiCluster Membership 0 1 .. 1 + 1 1 .. 0 Learner – Find BiClusters 1 0 … 1 – 1 1 … 0 + – 0 0 … 1 Classifier + 1 1 … 0 – Projects and Status • Structured Prediction(ongoing) • Dimensional Reduction: (ongoing; RoBiC: Poster#8) • Budgeted Learning(ongoing) • SemiSupervised Learning (ongoing) • Active Learning (ongoing) • CostCurves (complete; Post#26)

  30. Technical Details Using Variance Estimates to Combine Bayesian Classifiers

  31. C2 o + o + + o + + o o o + C1 o o + + + o o + + + o + o + + o o o + o + + o + + o + * o + + o o C3 o + + § C4 o o + + + o Motivation • Spse many different classifiers … • For each instance, want each classifier to… • “know what it knows”… • … and shout LOUDEST when it knows best… • “Loudness” 1/ Variance !

  32. Mixture Using Variance • Given belief net classifier • fixed (correct) structure • parameters  estimated from (random) datasample • Response to query “P(+c| -e, +w)” is… • asymptotically normal with … • (asymptotic) variance • Variance easy to compute … • for simple structures (Naïve Bayes, TAN) … and • for complete queries

  33. Experiment #4b:MUV(kNB, Adaboost, js) vs AdaBoost(NB) • MUV significantly out-performs AdaBoost • even when using base-classifiers that AdaBoost generated! MUV(kNB, AdaBoost, js) better than AdaBoost[NB] with p < 0.023

  34. MUV Results • Sound statistical foundation • Very effective classifier … • …across many real datasets • MUV(NB) better than AdaBoost(NB)! C. Lee, S. Wang and R. Greiner; ICML’06

  35. Mixture Using Variance … next steps? • Other structures (beyond NB, TAN) • Beyond just tabular CPtables for discrete variables • Noisy-or • Gaussians • Learn different base-classifiers from different subset of features • Scaling up to many MANY features • overfitting characteristics?

  36. Confidence in Classifier • Confidence of Prediction? • Fit each j, j2 to Beta(aji, bj) • Compute area CDFBeta(aj, bj)(0.5)

  37. Labeled Training Data UnLabeled Training Data Learner Semi-Supervised Learning Classifier No

  38. Approaches • Ignore the unlabeled data • Great if have LOTS of labeled data • Use the unlabeled data, as is… • “Semi-Supervised Learning”… based on • large margin (SVM) • graph • probabilistic model • Pay to get labels for SOME unlabeled data • “Active Learning”

  39. Semi-supervised Multi-class SVM • Approach: find a labeling that would yield an optimal SVM classifier, on the resulting training data. • Hard, but • semi-definite relaxations can approximate this objective surprisingly well • training procedures are computationally intensive, but produce high quality generalization results. L. Xu, J. Neufeld, B. Larson, D. Schuurmans. Maximum margin clustering. NIPS-04. L. Xu and D. Schuurmans. Unsupervised and semi-supervised multi-class SVMs. AAAI-05.

  40. Probabilistic Approach to Semi-Supervised Learning • Probabilistic model: P(y|x) • Context: non-IID data • Language modelling • Segmenting Brain Tumor from MR Images • Use Unlabeled Data as Regularizer • Future: Other applications… C-H. Lee, W. Shaojun, F. Jiao, D. Schuurmans and R. Greiner. Learning to Model Spatial Dependency: Semi-Supervised Discriminative Random Fields. NIPS06. F. Jiao, S. Wang, C. Lee, R. Greiner, and D. Schuurmans. Semi-supervised conditional random fields for improved sequence segmentation and labeling. COLING/ACL06.

  41. Active Learning • Pay for label to query xi that ... maximizes conditional mutual information about unlabeled data: • How to determine yi ? • Take EXPECTATION wrtYi ? • Use OPTIMISTIC guess wrt Yi ?

  42. Optimistic Active Learning using Mutual Information • Need Optimism • Need “on-line adjustment” • Better than just MostUncertain, … breast pima Y. Guo and R. Greiner. Optimistic active learning using mutual information. IJCAI’07

  43. Future Work on Active Learning • Understand WHY “optimism” works… + other applications of optimism • Extend framework to deal with • non-iid data • different qualities of labelers • …

More Related