Lirong Xia

Review of Introduction to AI Lirong Xia Tuesday, May 6, 2014

About the final exam • When: Tues, 5/13, 3-6pm • Where: Low4050 • Same rule as midterm • open book and lecture notes • simple calculators are allowed • cannot use smartphone/laptops/wifi • No Joe’s OH tomorrow • 5/9 in class office hours • please bring your HW2

Outline • Search • uninformed search • informed search • CSP • planning • minimax, alpha-beta pruning • expectimax • Probabilistic inference • Machine learning

Search Problems • A search problem consists of: • A state space …… • A successor function (with actions, costs) • A start state and a goal test • A solution is a sequence of actions (a plan) which transforms the start state to a goal state

Search algorithms • Uninformed search • BFS • DFS • Informed search • UCS • Best first (greedy) • A*

State Graphs vs. Search Trees State graph Search trees • State graphs: a representation of the search problem • each node is an abstract of the state of the world • Search tree: a tool that helps us to find the solution • each node represents an entire path in the graph • tree nodes are constructed on demand and we construct as little as possible

Fixed BFS • Never expand a node whose state has been visited • Fringe can be maintained as a First-In-First-Out (FIFO) queue (class Queue in util.py) • Maintain a set of visited states • fringe := {node corresponding to initial state} • loop: • if fringe empty, declare failure • choose and remove the top node v from fringe • check if v’s state s is a goal state; if so, declare success • if v’s state has been visited before, skip • if not, expand v, insert resulting nodes whose states have not been visited into fringe

A*: Combining UCS and Greedy • Uniform-cost orders by path cost • Greedy orders by goal proximity, or forward cost • A* search orders by the sum:

Admissible Heuristics • A heuristic is admissible (optimistic) if: • where is the true cost to a nearest goal • Examples: • Coming up with admissible heuristics is most of what’s involved in using A* in practice

Consistency of Heuristics • Stronger than admissibility • Definition: • real cost cost implied by heuristic • Consequences: • The f value along a path never decreases

Constraint Satisfaction Problems • Standard search problems: • State is a “black box”: arbitrary data structure • Goal test: any function over states • Successor function can be anything • Constraint satisfaction problems (CSPs): • A special subset of search problems • State is defined by variables with values from a domain (sometimes depends on ) • Goal test is a set of constraints specifying allowable combinations of values for subsets of variables

Constraint Graphs • Binary CSP: each constraint relates (at most) two variables • Binary constraint graph: nodes are variables, arcs show constraints • General-purpose CSP algorithms use the graph structure to speed up search. E.g., Tasmania is an independent subproblem!

CSP algorithms • A special search problem • constraints presented by a graph • Backtracking search • DFS with fixed order, choose one value in every step • Improvements of backtracking search

Arc Consistency of a CSP • A simple form of propagation makes sure all arcs are consistent: • If V loses a value, neighbors of V need to be rechecked! • Arc consistency detects failure earlier than forward checking • Can be run as a preprocessor or after each assignment • Might be time-consuming Delete from tail! X X X

Improving Backtracking • General-purpose ideas give huge gains in speed • Ordering: • Minimum remaining values (MRV) • least constraining value • Filtering: Can we detect inevitable failure early? • forward checking search • Structure of the problem • constraint graph is a tree

Planning problems • STRIPS language • state of the world: conjunction of positive, ground, function-free literals • Action • Preconditions: a set of activating literals • Effects: updates of active literals

Blocks world B D • Start: On(B, A), On(A, Table), On(D, C), On(C, Table), Clear(B), Clear(D) • Move(x,y,z) • Preconditions: • On(x,y), Clear(x), Clear(z) • Effects: • On(x,z), Clear(y), NOT(On(x,y)), NOT(Clear(z)) • MoveToTable(x,y) • Preconditions: • On(x,y), Clear(x) • Effects: • On(x,Table), Clear(y), NOT(On(x,y)) A C

Blocks world example B D • Goal: On(A,B) AND Clear(A) AND On(C,D) AND Clear(C) • A plan: MoveToTable(B, A), MoveToTable(D, C), Move(C, Table, D), Move(A, Table, B) A C

Adversarial Games • Deterministic, zero-sum games: • Tic-tac-toe, chess, checkers • The MAX player maximizes result • The MIN player minimizes result • Minimax search: • A search tree • Players alternate turns • Each node has a minimax value: best achievable utility against a rational adversary

Alpha-Beta Pruning • General configuration • We’re computing the MIN-VALUE at n • We’re looping over n’s children • n’s value estimate is dropping • α is the best value that MAX can get at any choice point along the current path • If n becomes worse than α, MAX will avoid it, so can stop considering n’s other children • Define β similarly for MIN • α is usually smaller than β • Once α >= β, return to the upper layer

Expectimax Search Trees • Expectimax search • Max nodes (we) as in minimaxsearch • Chance nodes • Need to compute chance node values as expected utilities

Outline • Search • Probabilistic inference • Bayesian networks • probability representation • conditional independence (d-separation) • inference (variable elimination) • Markov decision process • value iteration • policy iteration • Hidden Markov models • filtering • Machine learning

Bayesian networks • Definition of Bayesian network (Bayes’ net or BN) • A set of nodes, one per variable X • A directed, acyclic graph • A conditional distribution for each node • A collection of distributions over X, one for each combination of parents’ values p(X|a1,…,an) • CPT: conditional probability table A Bayesian network = Topology (graph) + Local Conditional Probabilities

Probabilities in BNs • Bayesian networks implicitly encode joint distributions • As a product of local conditional distributions • Example: • This lets us reconstruct any entry of the full joint • Not every BN can represent every joint distribution • The topology enforces certain conditional independencies

Reachability (D-Separation) • Question: are X and Y conditionally independent given evidence vars {Z}? • Yes, if X and Y “separated” by Z • Look for active paths from X to Y • No active paths = independence! • A path is active if each triple is active: • Causal chain where B is unobserved (either direction) • Common cause where B is unobserved • Common effect where B or one of its descendents is observed • All it takes to block a path is a single inactive segment

Markov Decision Processes • An MDP is defined by: • A set of statess∈S • A set of actionsa∈A • A transition function T(s,a,s’) • Prob that a from s leads to s’ • i.e., p(s’|s,a) • sometimes called the model • A reward function R(s, a, s’) • Sometimes just R(s) or R(s’) • A start state (or distribution) • Maybe a terminal state • MDPs are a family of nondeterministic search problems • Reinforcement learning (next class): MDPs where we don’t know the transition or reward functions

Defining MDPs • Markov decision processes: • States S • Start state s0 • Actions A • Transition p(s’|s,a) (or T(s,a,s’)) • Reward R(s,a,s’) (and discount ) • MDP quantities so far: • Policy = Choice of action for each (MAX) state • Utility (or return) = sum of discounted rewards

The Bellman Equations • Definition of “optimal utility” leads to a simple one-step lookahead relationship amongst optimal utility values: Optimal rewards = maximize over first and then follow optimal policy • Formally:

Value Iteration • Idea: • Start with V1(s) = 0 • Given Vi, calculate the values for all states for depth i+1: • Repeat until converge • Use Vias evaluation function when computing Vi+1

Policy Iteration • Alternative approach: • Step 1: policy evaluation: calculate utilities for some fixed policy (not optimal utilities!) • Step 2: policy improvement: update policy using one-step look-ahead with resulting converged (but not optimal!) utilities as future values • Repeat steps until policy converges

Markov Models • A Markov model is a chain-structured BN • Conditional probabilities are the same (stationarity) • Value of X at a given time is called the state • As a BN: • Parameters: called transition probabilities or dynamics, specify how the state evolves over time (also, initial probs) p(X1) p(X|X-1)

Hidden Markov Models • Markov chains not so useful for most agents • Eventually you don’t know anything anymore • Need observations to update your beliefs • Hidden Markov models (HMMs) • Underlying Markov chain over state X • You observe outputs (effects) at each time step • As a Bayes’ net:

HMM weather example: Filtering .6 p(w|s) = .1 p(w|c) = .3 p(w|r) = .8 s .1 .3 .4 .3 .2 c r .3 .3 .5 • You have been stuck in the lab for three days (!) • On those days, your labmate was dry, wet, wet, respectively • What is the probability that it is now raining outside? • p(X3 = r | E1= d, E2 = w, E3 = w)

Formal algorithm for filtering • The forward algorithm • Elapse of time • compute p(Xt+1|Xt,e1:t) from p(Xt|e1:t) • Observe • compute p(Xt+1|e1:t+1) from p(Xt+1|e1:t) • Renormalization

Forward algorithm vs. particle filtering Forward algorithm Particle filtering • Elapse of time B’(Xt)=Σxt-1p(Xt|xt-1)B(xt-1) • Observe B(Xt) ∝p(et|Xt)B’(Xt) • Renormalize B(xt) sum up to 1 • Elapse of time x--->x’ • Observe w(x’)=p(et|x) • Resample resample N particles

Outline • Search • Probabilistic inference • Machine learning • supervised learning • Parametric • generative: Naïve Bayes • discriminative method: perceptrons and MIRA • Non-parametric: K-NN • unsupervised learning • k-means • reinforcement learning • Q-learning

Important Concepts • Data: labeled instances, e.g. emails marked spam/ham • Training set • Held out set (we will give examples today) • Test set • Features: attribute-value pairs that characterize each x • Experimentation cycle • Learn parameters (e.g. model probabilities) on training set • (Tune hyperparameters on held-out set) • Compute accuracy of test set • Very important: never “peek” at the test set! • Evaluation • Accuracy: fraction of instances predicted correctly • Overfitting and generalization • Want a classifier which does well on test data • Overfitting: fitting the training data very closely, but not generalizing well

General Naive Bayes • A general naive Bayesmodel: • We only specify how each feature depends on the class • Total number of parameters is linear in n

Estimation: Laplace Smoothing • Laplace’s estimate (extended): • Pretend you saw every outcome k extra times • What’s Laplace with k=0? • k is the strength of the prior • Laplace for conditionals: • Smooth each condition independently:

Generative vs. Discriminative • Generative classifiers: • E.g. naive Bayes • A causal model with evidence variables • Query model for causes given evidence • Discriminative classifiers: • No causal model, no Bayes rule, often no probabilities at all! • Try to predict the label Y directly from X • Robust, accurate with varied features • Loosely: mistake driven rather than model driven

Linear Classifiers (perceptrons) • Inputs are feature values • Each feature has a weight • Sum is the activation • If the activation is: • Positive: output +1 • Negative, output -1

Learning: Multiclass Perceptron • Start with all weights = 0 • Pick up training examples one by one • Predict with current weights • If correct, no change! • If wrong: lower score of wrong answer, raise score of right answer

MIRA • Idea: adjust the weight update to mitigate these effects • MIRA*: choose an update size that fixes the current mistake *Margin Infused Relaxed Algorithm

Parametric / Non-parametric • Parametric models: • Fixed set of parameters • More data means better settings • Non-parametric models: • Complexity of the classifier increases with data • Better in the limit, often worse in the non-limit • (K)NN is non-parametric

K-Means • An iterative clustering algorithm • Pick K random points as cluster centers (means) • Alternate: • Assign data instances to closest mean • Assign each mean to the average of its assigned points • Stop when no points’ assignments change

K-Means as Optimization • Consider the total distance to the means: • Each iteration reduces phi • Two states each iteration: • Update assignments: fix means c, change assignments a • Update means: fix assignments a, change means c points assignments means

Reinforcement learning • Similar to MDP • Don’t know T and/or R, but can observe R • Learn by doing • can have multiple episodes (trials)

MDPs vs. RL Techniques: • Computation • Value and policy iteration • Policy evaluation • Model-based RL • sampling • Model-free RL: • Q-learning Things we know how to do: • If we know the MDP • Compute V*, Q*, π* exactly • Evaluate a fixed policy π • If we don’t know the MDP • If we can estimate the MDP then solve • We can estimate V for a fixed policy π • We can estimate Q*(s,a) for the optimal policy while executing an exploration policy

Q-Learning • Q-Learning: sample-based Q-value iteration • Learn Q*(s,a) values • Receive a sample (s,a,s’,R) • Consider your old estimate: Q(s,a) • Consider your new sample estimate: • Incorporate the new estimate into a running average

Lirong Xia

Lirong Xia

Presentation Transcript

Lirong Xia

Lirong Xia

Lirong Xia

Lirong Xia

Lirong Xia

Lirong Xia

Lirong Xia

Lirong Xia

Lirong Xia

Lirong Xia

Lirong Xia

Lirong Xia

Lirong Xia

Lirong Xia

Lirong Xia