Active learning 02 750
Download
1 / 8

Active Learning 02-750 - PowerPoint PPT Presentation


  • 94 Views
  • Uploaded on

Active Learning 02-750. Jaime Carbonell , Language Technologies Institute Carnegie Mellon University www.cs.cmu.edu/~{jgc | pinard | jinruih | vamshi} 27 September 2010. Active Learning. Training data: Special case: Functional space: Fitness Criterion: a.k.a. loss function

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Active Learning 02-750' - martina-mclean


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Active learning 02 750

Active Learning02-750

Jaime Carbonell,

Language Technologies Institute

Carnegie Mellon University

www.cs.cmu.edu/~{jgc| pinard | jinruih | vamshi}

27 September 2010


Active learning
Active Learning

  • Training data:

    • Special case:

  • Functional space:

  • Fitness Criterion:

    • a.k.a. loss function

  • Sampling Strategy:

Jaime G. Carbonell, Language Technolgies Institute


Cost sensitive active learning pp37 39 settles
Cost Sensitive Active Learning(pp37-39 Settles)

  • Suppose not all instances cost the same to label

    • Cytoplasmic vs membrane proteins for structure prediction via X-ray crystallography

    • Books vs web pages for topic labels

    • Near-misses vs clear examples

  • Suppose labelers vary in costs

    • Crystallography vs MRI for protein structures

    • Linguists vs Turkers for Machine Translation

  • How to cope with cost-accuracy tradeoffs?

    • Proactive learning (coming later)

Jaime G. Carbonell, Language Technolgies Institute


Active learning beyond instances
Active Learning Beyond Instances

  • Active Class Selection (p33 Settles)

    • Given a class, query instances thereof

    • Typical vs boundary instances

  • Active Feature Selection

    • Query values of features across many instances

    • Enables meaningful “batch” experiments

    • Generalized to Instance-Feature matrix

  • Active Clustering (p33-34 Settles)

    • Semi-supervised: new classes can spawn

    • Subsampling for effective unsupervised clustering

Jaime G. Carbonell, Language Technolgies Institute


Batch mode active learning
Batch-Mode Active Learning

  • Why would we want Q-batch vs Q-1?

    • Amortize experimental set up

    • Keep human labeler efficiently busy

      • “Staleness” vs utilization (Ringer, 2010)

    • Crowd sourcing  parallelizable AL

  • How do we select batches? (pp 35-36 Settles)

    • Instance Diversity in batch as part of samling (Brinker 2003, Donmez & Carbonell, 2008)

    • Modular and submodular functions (Hoi 2006)

    • Need a joint optimization criterion

Jaime G. Carbonell, Language Technolgies Institute


Noisy labelers or experiments pp37 39 settles
Noisy Labelers or Experiments(pp37-39 Settles)

  • Labeling noise  version-space learning flawed

    • E.g. cannot apply SVM shrinking-margin

    • Underlying ML algorithm must be noise resistant

  • Reducing noisy labels if p(correct) > 0.5

    • Repeated labeling (if random noise)

    • Majority vote (if semi-independent labelers)

    • Tradeoffs in repeat vs new labels

    • Cost vs accuracy tradeoffs

  • What if the labeler accuracy is not known?

    • Learn/estimate labeler accuracy as part of AL

    •  Proactive Learning (later class)

Jaime G. Carbonell, Language Technolgies Institute


Readings
Readings

  • Burr Settles – Comprehensive Survey of AL http://www.cs.cmu.edu/~bsettles/pub/settles.activelearning.pdf

  • Donmez, P. Carbonell, J. and Bennett, P. “Dual-Strategy Active Learning” http://www.cs.cmu.edu/~jgc/publication/Dual_Strategy_ECML_2007.pdf

  • Cohn, Ghahramani and Jordan, “Active Learning with Statistical Models” http://dspace.mit.edu/bitstream/handle/1721.1/7192/AIM-1522.pdf;jsessionid=13C2A9BF0DEC1567B9CA33F0C43BC3C3?sequence=2

Jaime G. Carbonell, Language Technolgies Institute


Thank you
THANK YOU!

Jaime G. Carbonell, Language Technolgies Institute


ad