1 / 24

Dynamic Optimization and Learning for Renewal Systems

T/R. T/R. T/R. T/R. T/R. Dynamic Optimization and Learning for Renewal Systems. Task 3. Task 2. Task 1. t. T[0]. T[1]. T[2]. Network Coordinator. Michael J. Neely, University of Southern California Asilomar Conference on Signals, Systems, and Computers, Nov. 2010

duena
Download Presentation

Dynamic Optimization and Learning for Renewal Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. T/R T/R T/R T/R T/R Dynamic Optimization and Learning for Renewal Systems Task 3 Task 2 Task 1 t T[0] T[1] T[2] Network Coordinator Michael J. Neely, University of Southern California Asilomar Conference on Signals, Systems, and Computers, Nov. 2010 PDF of paper at: http://ee.usc.edu/stochastic-nets/docs/renewal-systems-asilomar2010.pdf Sponsored in part by the NSF Career CCF-0747525, ARL Network Science Collaborative Tech. Alliance

  2. A General Renewal System y[2] y[0] y[1] t T[0] T[1] T[2] • Renewal Frames r in {0, 1, 2, …}. • π[r] = Policy chosen on frame r. • P = Abstract policy space (π[r] in P for all r). • Policy π[r] affects frame size and penalty vector on frame r. • y[r] = [y0(π[r]), y1(π[r]), …, yL(π[r])] • T[r] = T(π[r]) = Frame Duration π[r]

  3. A General Renewal System y[2] y[0] y[1] t T[0] T[1] T[2] • Renewal Frames r in {0, 1, 2, …}. • π[r] = Policy chosen on frame r. • P = Abstract policy space (π[r] in P for all r). • Policy π[r] affects frame size and penalty vector on frame r. • These are random functions of π[r] (distribution depends on π[r]): • y[r] = [y0(π[r]), y1(π[r]), …, yL(π[r])] • T[r] = T(π[r]) = Frame Duration π[r]

  4. A General Renewal System y[2] y[0] y[1] t T[0] T[1] T[2] • Renewal Frames r in {0, 1, 2, …}. • π[r] = Policy chosen on frame r. • P = Abstract policy space (π[r] in P for all r). • Policy π[r] affects frame size and penalty vector on frame r. • These are random functions of π[r] (distribution depends on π[r]): • y[r] = [1.2,1.8, …, 0.4] • T[r] = 8.1 = Frame Duration π[r]

  5. A General Renewal System y[2] y[0] y[1] t T[0] T[1] T[2] • Renewal Frames r in {0, 1, 2, …}. • π[r] = Policy chosen on frame r. • P = Abstract policy space (π[r] in P for all r). • Policy π[r] affects frame size and penalty vector on frame r. • These are random functions of π[r] (distribution depends on π[r]): • y[r] = [0.0,3.8, …, -2.0] • T[r] = 12.3 = Frame Duration π[r]

  6. A General Renewal System y[2] y[0] y[1] t T[0] T[1] T[2] • Renewal Frames r in {0, 1, 2, …}. • π[r] = Policy chosen on frame r. • P = Abstract policy space (π[r] in P for all r). • Policy π[r] affects frame size and penalty vector on frame r. • These are random functions of π[r] (distribution depends on π[r]): • y[r] = [1.7,2.2, …, 0.9] • T[r] = 5.6 = Frame Duration π[r]

  7. Example 1: Opportunistic Scheduling S[r] = (S1[r], S2[r], S3[r]) • All Frames = 1 Slot • S[r] = (S1[r], S2[r], S3[r]) = Channel States for Slot r • Policy p[r]: • On frame r: First observe S[r], then choose a • channel to serve (i.,e, {1, 2, 3}). • Example Objectives: thruput, energy, fairness, etc.

  8. Example 2: Markov Decision Problems 2 4 1 3 • M(t) = Recurrent Markov Chain (continuous or discrete) • Renewals are defined as recurrences to state 1. • T[r] = random inter-renewal frame size (frame r). • y[r] = penalties incurred over frame r. • π[r] = policy that affects transition probs over frame r. • Objective: Minimize time average of one penalty • subj. to time average constraints on others.

  9. T/R T/R T/R T/R T/R T/R Example 3: Task Processing over Networks Task 3 Task 2 Task 1 Network Coordinator • Infinite Sequence of Tasks. • E.g.: Query sensors and/or perform computations. • Renewal Frame r = Processing Time for Frame r. • Policy Types: • Low Level: {Specify Transmission Decisions over Net} • High Level: {Backpressure1, Backpressure2, Shortest Path} • Example Objective: Maximize quality of information per unit time subject to per-node power constraints.

  10. Quick Review of Renewal-Reward Theory (Pop Quiz Next Slide!) Define the frame-average for y0[r]: The time-average for y0[r] is then: *If i.i.d. over frames, by LLN this is the same as E{y0}/E{T}.

  11. Pop Quiz: (10 points) • Let y0[r] = Energy Expended on frame r. • Time avg. power = (Total Energy Use)/(Total Time) • Suppose (for simplicity) behavior is i.i.d. over frames. • To minimize time average power, which one should • we minimize? (a) (b)

  12. Pop Quiz: (10 points) • Let y0[r] = Energy Expended on frame r. • Time avg. power = (Total Energy Use)/(Total Time) • Suppose (for simplicity) behavior is i.i.d. over frames. • To minimize time average power, which one should • we minimize? (a) (b)

  13. Two General Problem Types: 1) Minimize time average subject to time average constraints: 2) Maximize concave function φ(x1, …, xL) of time average:

  14. Solving the Problem (Type 1): Define a “Virtual Queue” for each inequality constraint: Zl[r] clT[r] yl[r] Zl[r+1] = max[Zl[r] – clT[r] + yl[r], 0]

  15. Lyapunov Function and “Drift-Plus-Penalty Ratio”: Z1(t) Z2(t) • Scalar measure of queue sizes: L[r] = Z1[r]2 + Z2[r]2 + … + ZL[r]2 Δ(Z[r]) = E{L[r+1] – L[r] | Z[r]} = “Frame-Based Lyap. Drift” • Algorithm Technique: Every frame r, observe Z1[r], …, ZL[r]. • Then choose a policy π[r] in P to minimize: Δ(Z[r]) + VE{y0[r]|Z[r]} E{T|Z[r]} “Drift-Plus-Penalty Ratio” =

  16. The Algorithm Becomes: • Observe Z[r] = (Z1[r], …, ZL[r]). Choose π[r] in P to solve: • Then update virtual queues: Zl[r+1] = max[Zl[r] – clT[r] + yl[r], 0] Δ(Z[r]) + VE{y0[r]|Z[r]} E{T|Z[r]}

  17. DPP Ratio: Theorem: Assume the constraints are feasible. Then under this algorithm, we achieve: (a) (b) Δ(Z[r]) + VE{y0[r]|Z[r]} E{T|Z[r]} For all frames r in {1, 2, 3, …}

  18. Solving the Problem (Type 2): • We reduce it to a problem with the structure of Type 1 via: • Auxiliary Variables γ[r] = (γ1[r], …, γL[r]). • The following variation on Jensen’s Inequality: • For any concave function φ(x1, .., xL) and any (arbitrarily correlated) vector of random variables • (x1, x2, …, xL, T), where T>0, we have: E{Tφ(X1, …, XL)} φ( ) E{T(X1, …, XL)} ≤ E{T} E{T}

  19. The Algorithm (type 2) Becomes: • On frame r, observe Z[r] = (Z1[r], …, ZL[r]). • (Auxiliary Variables) • Choose γ1[r], …, γL[r] to max the below deterministic problem: • (Policy Selection) Choose π[r] in P to minimize: • Then update virtual queues: Zl[r+1] = max[Zl[r] – clT[r] + yl[r], 0], Gl[r+1] = max[Gl[r] + γl[r]T[r] - yl[r], 0]

  20. T/R T/R T/R T/R T/R Example Problem – Task Processing: Task 3 Task 2 Task 1 Network Coordinator Idle I[r] Setup Transmit Frame r • Every Task reveals random task parameters η[r]: • η[r] = [(qual1[r], T1[r]), (qual2[r], T2[r]), …, (qual5[r], T5[r])] • Choose π[r] = [which node to transmit, how much idle] • in {1,2,3,4,5} X [0, Imax] • Transmissions incur power • We use a quality distribution that tends to be better for higher-numbered nodes. • Maximize quality/time subject to pav≤ 0.25 for all nodes.

  21. Minimizing the Drift-Plus-Penalty Ratio: • Minimizing a pure expectation, rather than a ratio, • is typically easier (see Bertsekas, TsitsiklisNeuro-DP). • Define: • “Bisection Lemma”:

  22. Learning via Sampling from the past: • Suppose randomness characterized by: • {η1, η2, ..., ηW} (past random samples) • Want to compute (over unknown random distribution of η): • Approximate this via W samples from the past:

  23. Simulation: Alternative Alg. With Time Averaging Drift-Plus-Penalty Ratio Alg. With Bisection Quality of Information / Unit Time Sample Size W

  24. Concluding Sims (values for W=10): • Quick Advertisement: New Book: • M. J. Neely, Stochastic Network Optimization with Application to Communication and Queueing Systems. Morgan & Claypool, 2010. • http://www.morganclaypool.com/doi/abs/10.2200/S00271ED1V01Y201006CNT007 • PDF also available from “Synthesis Lecture Series” (on digital library) • Lyapunov Optimization theory (including these renewal system problems) • Detailed Examples and Problem Set Questions.

More Related