Machine learning and review
Download
1 / 44

Machine Learning and Review - PowerPoint PPT Presentation


  • 430 Views
  • Updated On :

Machine Learning and Review Reading: C. 18 Bayesian Approach Each observed training example can incrementally decrease or increase probability of hypothesis instead of eliminate an hypothesis Prior knowledge can be combined with observed data to determine hypothesis

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Machine Learning and Review' - jaden


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Bayesian approach l.jpg
Bayesian Approach

  • Each observed training example can incrementally decrease or increase probability of hypothesis instead of eliminate an hypothesis

  • Prior knowledge can be combined with observed data to determine hypothesis

  • Bayesian methods can accommodate hypotheses that make probabilistic predictions

  • New instances can be classified by combining the predictions of multiple hypotheses, weighted by their probabilities


Applying bayes theorem l.jpg
Applying Bayes Theorem

  • Best hypothesis = most probable hypothesis

    • Maximum a posteriori (MAP) hypothesis

  • Variables

    • h = hypothesis

    • D = data

  • Prior probability

    • h: P(h)

    • training data observed: P(D)

  • P(D|h) = probability of observing data D given some world where hypothesis holds

  • Bayes theorem:

    • P(h|D) = P(D|h)*P(h) P(D)


  • Defining the map hypothesis l.jpg
    Defining the MAP hypothesis

    • hMAP=argmax P(h|D) hεH

    • hMAP=argmax P(D|h)*P(h) hεH P(D)

      (Using Bayes Theorem)

    • hMAP=argmax P(D|h)*P(h) hεH (P(D) is a constant independent of h)

    • hMAP=argmax P(D|h) hεH(when we can make the assumption that each hypothesis h is equally probable)


    Bayes optimal classifier l.jpg
    Bayes Optimal Classifier

    • The most probable classification of the new instance by combining the predictions of all hypotheses weighted by their posterior probabilities

      • Possible classifications: vjεV

      • Argmax ∑ P(vj|hi)P(hi|D)vjεVhiεH


    Example l.jpg
    Example

    • V = {p, n}

    • P(h1|D)=.4 P(p|h1)=0 P(n,h1)=1

    • P(h2|D)=.3 P(p|h2)=1 P(n,h2)=0

    • P(h3|D)=.3 P(p|h3)=1 P(n,h3)=0

      • ∑ P(n|hi)P(hi|D) = .4hiεH

      • ∑ P(p|hi)P(hi|D) = .6

        hiεH

      • Argmax ∑ P(vj|hi)P(hi|D) = p

        vjε{p,n}hiεH


    Properties of bayesian approach l.jpg
    Properties of Bayesian Approach

    • Bayesian learning is optimal

    • Easy to estimate P(h) by counting in training data

    • Estimating P(D|h) not feasible

    • Why?



    Na ve bayes l.jpg
    Naïve Bayes

    • Assume independence of attributes

      • D = a1,a2,…an

      • P(a1,a2,…an|vj)=∏P(ai|vj)i

    • Substitute into VMAP formula

      • VNB=argmax P(vj)∏P(ai|vj) vjV i


    V nb argmax p v j p a i v j v j v l.jpg
    VNB=argmax P(vj)∏P(ai|vj) vjV


    Estimating probabilities l.jpg
    Estimating Probabilities

    • What happens when the number of data elements is small?

    • Suppose true P(S-length=high|verginica)=.05

    • There are only 2 instances with C=Verginica

    • We estimate probability by nc/n or #S-length|Verginica/C-Verginica

    • #S-length|Verginica must = 0

    • Then, instead of .05 we use estimated probability of 0

    • Two problems

      • Biased underestimate of probability

      • This probability term will dominate


    Instead l.jpg
    Instead

    • Use priors as well

    • nc+mp n+m

      • Where p = prior estimate

      • M is a constant called the equivalent sample size

        • Determines how heavily to weight p relative to observed data

        • Typical method: assume a uniform prior


    Benefits of na ve bayes l.jpg
    Benefits of Naïve Bayes

    • Practical

    • As effective and in some cases, more so, than other machine learners


    Review for midterm l.jpg
    Review for Midterm

    • Concepts you should know

    • Search algorithms

      • Depth-first, breadth-first, iterative deepening, A*, greedy, hill-climbing, beam

  • Constraint propagation

  • Game playing

  • Bayesian Nets

  • A little on machine learning


  • Midterm format l.jpg
    Midterm format

    • Multiple choice

    • Short answer questions

    • Problem solving

    • Essay

    • An example midterm will be posted under links


    Concepts l.jpg
    Concepts

    • Any words in yellow or light blue or pink on slides


    Uninformed search l.jpg
    Uninformed Search

    • Depth-first

    • Breadth-first

    • Iterative Deepening


    Formulating problems as search l.jpg
    Formulating Problems as Search

    Given an initial state and a goal, find the sequence of actions leading through a sequence of states to the final goal state.

    Terms:

    • Successor function: given action and state, returns {action, successors}

    • State space: the set of all states reachable from the initial state

    • Path: a sequence of states connected by actions

    • Goal test: is a given state the goal state?

    • Path cost: function assigning a numeric cost to each path

    • Solution: a path from initial state to goal state


    Breadth first l.jpg
    Breadth first

    • OPEN = start node; CLOSED = empty

    • While OPEN is not empty do

      • Remove leftmost state from OPEN, call it X

      • If X = goal state, return success

      • Put X on CLOSED

      • SUCCESSORS = Successor function (X)

      • Remove any successors on OPEN or CLOSED

      • Put remaining successors on right end of OPEN

  • End while


  • Depth first l.jpg
    Depth-first

    • OPEN = start node; CLOSED = empty

    • While OPEN is not empty do

      • Remove leftmost state from OPEN, call it X

      • If X = goal state, return success

      • Put X on CLOSED

      • SUCCESSORS = Successor function (X)

      • Remove any successors on OPEN or CLOSED

      • Put remaining successors on left end of OPEN

  • End while


  • Can we combine benefits of both l.jpg
    Can we combine benefits of both?

    • Depth limited

      • Select some limit in depth to explore the problem using DFS

      • How do we select the limit?

  • Iterative deepening

    • DFS with depth 1

    • DFS with depth 2 up to depth d


  • Complexity analysis l.jpg
    Complexity Analysis

    • Completeness: is the algorithm guaranteed to find a solution when there is one?

    • Optimality: Does the strategy find the optimal solution?

    • Time: How long does it take to find a solution?

    • Space: How much memory is needed to perform the search?

      Is this notion of completeness the same as completeness in logic?


    Cost variables l.jpg
    Cost variables

    • Time: number of nodes generated

    • Space: maximum number of nodes stored in memory

    • Branching factor: b

      • Maximum number of successors of any node

  • Depth: d

    • Depth of shallowest goal node

  • Path length: m

    • Maximum length of any path in the state space


  • Informed search l.jpg
    Informed Search

    • Best-first

    • A*

    • Greedy

    • Hill climbing

    • Variants

      • Randomness, Simulated annealing, Local beam search,

  • Online search will not be on midterm


  • Greedy search l.jpg
    Greedy Search

    • OPEN = start node; CLOSED = empty

    • While OPEN is not empty do

      • Remove leftmost state from OPEN, call it X

      • If X = goal state, return success

      • Put X on CLOSED

      • SUCCESSORS = Successor function (X)

      • Remove any successors on OPEN or CLOSED

      • Compute heuristic function for each node

      • Put remaining successors on either end of OPEN

      • Sort nodes on OPEN by value of heuristic function

  • End while


  • A search l.jpg
    A* Search

    • Try to expand node that is on least cost path to goal

    • Evaluation function = f(n)

      • f(n)=g(n)+h(n)

      • h(n) is heuristic function: cost from node to goal

      • g(n) is cost from initial state to node

  • f(n) is the estimated cost of cheapest solution that passes through n

  • If h(n) is an underestimate of true cost to goal

    • A* is complete

    • A* is optimal

    • A* is optimally efficient: no other algorithm using h(n) is guaranteed to expand fewer states


  • Admissable heuristics l.jpg
    Admissable heuristics

    • A heuristic that never overestimates the cost to the goal

    • h1 and h2 are admissable heuristics

    • Consistency: the estimated cost of reaching the goal from n is no greater than the step cost of getting to n’ plus estimated cost to goal from n’

      • h(n) <=c(n,a,n’)+h(n’)


    Local search algorithms l.jpg
    Local Search Algorithms

    • Operate using a single current state

    • Move only to neighbors of the state

    • Paths followed by search are not retained

    • Iterative improvement

      • Keep a single current state and try to improve it



    Problems for hill climbing l.jpg
    Problems for hill climbing

    When the higher the heuristic function the better: maxima (objective fns); when the lower the function the better: minima (cost fns)

    • Local maxima: A local maximum is a peak that is higher than each of its neighboring states, but lower than the global maximum

    • Ridges: a sequence of local maxima

    • Plateaux: an area of the state space landscape where the evaluation function is flat


    Some solutions l.jpg
    Some solutions

    • Stochastic hill-climbing

      • Chose at random from among the uphill moves

  • First-choice hill climbing

    • Generates successors randomly until one is generated that is better than current state

  • Random-restart hill climbing

    • Keep restarting from randomly generated initial states, stopping when goal is found

  • Simulated annealing

    • Generate a random move. Accept if improvement. Otherwise accept with continually decreasing probability.

  • Local beam search

    • Keep track of k states rather than just 1




  • Csp algorithm l.jpg
    CSP algorithm

    Depth-first search often used

    • Initial state: the empty assignment {}; all variables are unassigned

    • Successor fn: assign a value to any variable, provided no conflicts w/constraints

      • All CSP search algorithms generate successors by considering possible assignments for only a single variable at each node in the search tree

  • Goal test: the current assignment is complete

  • Path cost: a constant cost for every step


  • Local search l.jpg
    Local search

    • Complete-state formulation

      • Every state is a compete assignment that might or might not satisfy the constraints

  • Hill-climbing methods are appropriate


  • General purpose methods for efficient implementation l.jpg
    General purpose methods for efficient implementation

    • Which variable should be assigned next?

    • in what order should its values be tried?

    • Can we detect inevitable failure early?

    • Can we take advantage of problem structure?


    Order l.jpg
    Order

    • Choose the most constrained variable first

      • The variable with the fewest remaining values

      • Minimum Remaining Values (MRV) heuristic

  • What if there are >1?

    • Tie breaker: Most constraining variable

    • Choose the variable with the most constraints on remaining variables


  • Order on value choice l.jpg
    Order on value choice

    • Given a variable, chose the least constraining value

      • The value that rules out the fewest values in the remaining variables


    Forward checking l.jpg
    Forward Checking

    • Keep track of remaining legal values for unassigned variables

    • Terminate search when any variable has no legal values



    Game playing l.jpg
    Game Playing

    • Minimax

    • Alpha-beta pruning

    • Evaluation function (what is the difference between a cost function, a utility function, a heuristic function, an evaluation function?)


    Bayesian nets l.jpg
    Bayesian nets

    • Example problem


    ad