1 / 26

Longin Jan Latecki Temple University latecki@temple

Ch. 11: Optimization and Search Stephen Marsland, Machine Learning: An Algorithmic Perspective .  CRC 2009 some slides from Stephen Marsland, some images from Wikipedia. Longin Jan Latecki Temple University latecki@temple.edu. Gradient Descent.

Download Presentation

Longin Jan Latecki Temple University latecki@temple

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ch. 11: Optimization and SearchStephen Marsland, Machine Learning: An Algorithmic Perspective.  CRC 2009some slides from Stephen Marsland,some images from Wikipedia Longin Jan Latecki Temple University latecki@temple.edu

  2. Gradient Descent • We have already used it in the perceptron learning. • Our goal is to minimize a function f(x), where x=(x1, …, xn). • Starting with some initial point x0, we try to find a sequence of points xk that moves downhill to the closest local minimum. • A general strategy is xk+1 = xk + kpk

  3. Steepest Gradient Descent • A key question is what is pk? • We can make greedy choices and always go downhill as fast as possible. This implies that • Thus, we iterate xk+1 = xk + kpk • until f(xk)=0, which practically means until f(xk) < 

  4. The gradient of the function f(x,y) = −(cos2x + cos2y)2 depicted as a vector field on the bottom plane

  5. For example, the gradient of the function • is:

  6. Recall the Gradient Descent Learning Rule of Perceptron • Consider linear perceptron without threshold and continuous output (not just –1,1) • y=w0 + w1 x1 + … + wn xn • Train the wi’s such that they minimize the squared error E[w1,…,wn] = ½ dD (td-yd)2 where D is the set of training examples Then wk+1 = wk - kf(wk)= wk - kE(wk) We wrote wk+1 = wk +wk, thus wk = - kE(wk)

  7. (w1,w2) Gradient: E[w]=[E/w0,… E/wn] (w1+w1,w2 +w2) Gradient Descent w=- E[w] wi=- E/wi /wi 1/2d(td-yd)2 = d /wi 1/2(td-i wi xi)2 = d(td- yd)(-xi)

  8. Gradient Descent Error wi=- E/wi Stephen Marsland

  9. Newton Direction • Taylor Expansion: • If a f(x) is a scalar function, i.e., f: Rn → R, where x=(x1, …, xn), then f(x)=J(x) and 2f(x)=H(x), where J is a Jacobian a vector and H is a n×n Hessian matrix defined as

  10. Jacobian vector and Hessian matrix

  11. Newton Direction • Since we obtain In xk+1 = xk + kpk and the step size is always k=1.

  12. Search Algorithms • Example problem: Traveling Salesman Problem (TSP), which is introduced on next slides. • Then we will explore various search strategies and illustrate them on TSP: • Exhaustive Search • Greedy Search • Hill Climbing • Simulated Annealing

  13. The Traveling Salesman Problem • The traveling salesman problem is one of the classical problems in computer science. • A traveling salesman wants to visit a number of cities and then return to his starting point. Of course he wants to save time and energy, so he wants to determine the shortest cycle for his trip. • We can represent the cities and the distances between them by a weighted, complete, undirected graph. • The problem then is to find the shortest cycle (of minimum total weight that visits each vertex exactly one). • Finding the shortest cycle is different than Dijkstra’s shortest path. It is much harder too, no polynomial time algorithm exists!

  14. The Traveling Salesman Problem • Importance: • Variety of scheduling application can be solved as atraveling salesmen problem. • Examples: • Ordering drill position on a drill press. • School bus routing. • The problem has theoretical importance because it represents a class of difficult problems known as NP-hard problems.

  15. THE FEDERAL EMERGENCY MANAGEMENT AGENCY • A visit must be made to four local offices of FEMA, going out from and returning to the same main office in Northridge, Southern California.

  16. FEMA traveling salesmanNetwork representation

  17. 40 2 3 25 35 50 40 50 1 4 65 45 30 80 Home

  18. FEMA - Traveling Salesman • Solution approaches • Enumeration of all possible cycles. • This results in (m-1)! cycles to enumerate for a graph with m nodes. • Only small problems can be solved with this approach.

  19. Exhaustive Search by Full Enumeration Possible cycles Cycle Total Cost 1. H-O1-O2-O3-O4-H 210 2. H-O1-O2-O4-O3-H 195 3. H-O1-O3-O2-O3-H 240 4. H-O1-O3-O4-O2-H 200 5. H-O1-O4-O2-O3-H 225 6. H-O1-O4-O3-O2-H 200 7. H-O2-O3-O1-O4-H 265 8. H-O2-O1-O3-O4-H 235 9. H-O2-O4-O1-O3-H 250 10. H-O2-O1-O4-O3-H 220 11. H-O3-O1-O2-O4-H 260 12. H-O3-O1-O2-O4-H 260 Minimum For this problem we have (5-1)! / 2 = 12 cycles. Symmetrical problemsneed to enumerate only (m-1)! / 2 cycles.

  20. FEMA – optimal solution 40 2 3 25 35 50 40 1 50 4 65 45 30 80 Home

  21. The Traveling Salesman Problem • Unfortunately, no algorithm solving the traveling salesman problem with polynomial worst-case time complexity has been devised yet. • This means that for large numbers of vertices, solving the traveling salesman problem is impractical. • In these cases, we can use efficient approximation algorithms that determine a path whose length may be slightly larger than the traveling salesman’s path.

  22. Greedy Search TSP Solution • Choose the first city arbitrarily, and then repeatedly pick the city that is closest to the current city and that has not been yet visited. • Stop when all cities have been visited.

  23. Hill Climbing TSP Solution • Choose an initial tour randomly • Then keep swapping pairs of cities if the total length of tour decreases, i.e., if new dist. traveled < before dist. traveled. • Stop after a predefined number of swaps or when no swap improved the solution for some time. • As with greedy search, there is no way to predict how good the solution will be.

  24. Exploration and Exploitation • Exploration of the search space is like exhaustive search (always trying out new solutions) • Exploitation of the current best solution is like hill climbing (trying local variants of the current best solution) • Ideally we would like to have a combination of those two.

  25. Simulated Annealing TSP Solution • Like in hill climbing, keep swapping pairs of cities if new dist. traveled < before dist. traveled,orif (before dist. Traveled - new dist. Traveled) < T*log(rand) • Set T=c*T, where 0<c<1 (usually 0.8<c<1) • Thus, we accept a ‘bad’ solution if for some random number p

  26. Search Algorithms Covered • Exhaustive Search • Greedy Search • Hill Climbing • Simulated Annealing

More Related