1 / 18

Training examples

The Supervised (Deep) Learning Paradigm. Training examples. Accurate digit classifier. Training labels. 2. Supervised Learner. Supervised Learning is cool. A supervised problem is solved if:. A human can tell you the right answer many times. (1000 penguins…).

tamas
Download Presentation

Training examples

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Supervised (Deep) Learning Paradigm Training examples Accurate digit classifier Training labels 2 Supervised Learner

  2. Supervised Learning is cool

  3. A supervised problem is solved if: A human can tell you the right answer many times. (1000 penguins…) a tool to teach machines known things

  4. How about news? Repeatedly: • Observe features of user+articles • Choose a news article. • Observe click-or-not Goal: Maximize fraction of clicks

  5. A standard pipeline • Learn • Act: • Deploy in A/B test for 2 weeks • A/B test fails  Why?

  6. Q: What goes wrong? Is Ukraine interesting to John ? A: Need Right Signal for Right Answer

  7. Q: What goes wrong? Model value over time A: The world changes!

  8. HOW?

  9. GOOD

  10. BAD How do you learn from Reward signal?

  11. Reinforcement Learning (selected news story) Action Policy ? (user history, news stories) Observation (click-or-not) Reward My papers: KL02, KKL03, SLWLL06, DaLM09, CKADaL15, CHRDaL16, KAL16, JKALS17, MLA17, DaLMS18…

  12. Examples:

  13. A simple problem breaking all common multistep RL algorithms So Cold! A boy’s fire starts to die down The boy is warm for awhile So he goes searching for wood But he gets cold far from the fire But then the fire goes out So he returns to the fire steps to solution samples needed

  14. Consider: One-step Reinforcement Learning Limit the scope of the problems solved

  15. One step reinforcement learning is used! LZ07, BL09, LWLS10, SLLK10, DuLL11, DuHKKLRZ11, LWLW11, BLLRS, ADuKLS12, DELL12, BLS14, AHKLLS14, ABCLLLMORSS16, AKADuL17… News Rec: [LCLS ‘10] Ad Choice: [BPQCCPRSS ‘12] Ad Format: [TRSA ‘13] Education: [MLLBP ‘14] Music Rec: [WWHW ‘14] Robotics: [PG ‘16] Wellness/Health: [ZKZ ’09, SLLSPM ’11, NSTWCSM ’14, PGCRRH ’14, NHS ’15, KHSBATM ‘15, HFKMTY ’16] Tutorial @http://hunch.net/~rwil

  16. Algorithms in Vowpal Wabbit: Email us (jcl@microsoft.com) if you want to play with a service…

  17. Reinforcement problem solved if: An action outcome value can be measuredmany times. can learn new things about the world

  18. Much still to do---Join us  Nan Jiang Sham Kakade Satyen Kale Nikos Karampatziakis Akshay Krishnamurthy Lihong Li Stephane Ross Robert Schapire Wen Sun Tong Zhang Alekh Agarwal Alina Beygelzimer Drew Bagnell Avrim Blum Kai-Wei Chang Christoph Dann Hal Daume Simon Du Miroslav Dudik Geoff Gordon Daniel Hsu Yes, we are hiring  http://hunch.net/?p=9828091

More Related