Download
probability n.
Skip this Video
Loading SlideShow in 5 Seconds..
Probability PowerPoint Presentation
Download Presentation
Probability

Probability

135 Views Download Presentation
Download Presentation

Probability

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Probability CSE 473 – Autumn 2003 Henry Kautz

  2. ExpectiMax

  3. Hungry Monkey: 2-Ply Game Tree shake jump 1/6 5/6 1/3 2/3 shake shake jump shake jump shake jump jump 2/3 5/6 1/3 5/6 1/6 1/3 5/6 1/6 1/6 2/3 1/3 2/3 1/3 1/3 2/3 2/3 0 0 0 0 1 0 0 1 1 1 0 1 2 0 0 1

  4. ExpectiMax 1 – Chance Nodes shake jump 1/6 5/6 1/3 2/3 shake shake jump shake jump shake jump jump 2/3 7/6 1 0 0 0 1/6 1/6 2/3 5/6 1/3 5/6 1/6 1/3 5/6 1/6 1/6 2/3 1/3 2/3 1/3 1/3 2/3 2/3 0 0 0 0 1 0 0 1 1 1 0 1 2 0 0 1

  5. ExpectiMax 2 – Max Nodes shake jump 1/6 5/6 1/3 2/3 1/6 2/3 7/6 1/6 shake shake jump shake jump shake jump jump 2/3 7/6 1 0 0 0 1/6 1/6 2/3 5/6 1/3 5/6 1/6 1/3 5/6 1/6 1/6 2/3 1/3 2/3 1/3 1/3 2/3 2/3 0 0 0 0 1 0 0 1 1 1 0 1 2 0 0 1

  6. ExpectiMax 3 – Chance Nodes shake jump 1/2 1/3 1/6 5/6 1/3 2/3 1/6 2/3 7/6 1/6 shake shake jump shake jump shake jump jump 2/3 7/6 1 0 0 0 1/6 1/6 2/3 5/6 1/3 5/6 1/6 1/3 5/6 1/6 1/6 2/3 1/3 2/3 1/3 1/3 2/3 2/3 0 0 0 0 1 0 0 1 1 1 0 1 2 0 0 1

  7. ExpectiMax 4 – Max Node 1/2 shake jump 1/2 1/3 1/6 5/6 1/3 2/3 1/6 2/3 7/6 1/6 shake shake jump shake jump shake jump jump 2/3 7/6 1 0 0 0 1/6 1/6 2/3 5/6 1/3 5/6 1/6 1/3 5/6 1/6 1/6 2/3 1/3 2/3 1/3 1/3 2/3 2/3 0 0 0 0 1 0 0 1 1 1 0 1 2 0 0 1

  8. Policies • The result of the ExpectiMax analysis is a conditional plan (also called a policy): • Optimal plan for 2 steps: jump; shake • Optimal plan for 3 steps:jump; if (ontable) {shake; shake} else {jump; shake} • Probabilistic planning can be generalized in many ways, including: • Action costs • Hidden state • The general problem is that of solving a Markov Decision Process (MDP)

  9. 2 Player Games of Chance

  10. Backgammon • Branching factor: • Chance node: 21 • Max node: about 20 on average • Size of tree: O(ckmk) • In practice: can search 3 plies • Neurogammon & TD-Gammon (Tesauro 1995) • Learned weights on static evaluation function by playing against itself • Use results of games to optimize weights: • “Punish” features that were on in losing games • “Reward” features that were on in winning games • A kind of reinforcement learning • Became world’s best backgammon player!