1 / 21

Low-Cost Task Scheduling for Distributed-Memory Machines

Low-Cost Task Scheduling for Distributed-Memory Machines Andrei Radulescu and Arjan J.C. Van Gemund Presented by Bahadır Kaan Özütam. Outline. Introduction List Scheduling Preliminaries General Framework for LSSP Complexity Analysis Case Study Extensions for LSDP Conclusion.

cliff
Download Presentation

Low-Cost Task Scheduling for Distributed-Memory Machines

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Low-Cost Task Scheduling for Distributed-Memory Machines Andrei Radulescu and Arjan J.C. Van Gemund Presented byBahadır Kaan Özütam of 21

  2. Outline • Introduction • List Scheduling • Preliminaries • General Framework for LSSP • Complexity Analysis • Case Study • Extensions for LSDP • Conclusion of 21

  3. Introduction • Task Scheduling • Scheduling heuristics • Shared-memory - Distributed Memory • Bounded - unbounded number of processors • Multistep - singlestep methods • Duplicating - nonduplicating methods • Static - dynamic priorities of 21

  4. List Scheduling • LDSP and LSSP algorithms • LSSP (List Scheduling with Static Priorities); • Tasks are scheduled in the order of their previously computed priorities on the task’s “best” processor. • Best processor is ... • The processor enabling the earliest start time, if the performance is the main concern • The processor becoming idle the earliest, if the speed is the main concern. • LSDP (List Scheduling with Dynamic Priorities); • Priorities for task-processor pairs • more complex of 21

  5. List Scheduling • Reducing LSSP time complexity • O(V log(V) + (E+V)P) => O(V log (P) + E) V = expected number of tasks E = expected number of dependencies P = number of processors 1. Considering only two processors 2. Maintaining partially-sorted task priority queue with a limited number of tasks of 21

  6. V E E V V V E E E E E V V V E E E V Preliminaries • Parallel programs • (DAG) G = (V,E) • Computation cost Tw(t) • Communication cost Tc(t, t’) • Communication and computation ratio (CCR) • The task graph width (W) of 21

  7. Preliminaries • Entry and exit tasks • The bottom level (Tb) of the task • Ready = parents scheduled • Start time Ts(t) • Finish time Tf(t) • Partial schedule • Processor ready time  Tr(p) = max Tf(t) , t V, Pr(t)=p. • Processor becoming idle the earliest (pr)  Tr(pr) = min Tr(p) , p P of 21

  8. Preliminaries • The last message arrival time  Tm(t) = max { Tf(t’) + Tc(t’, t) } (t’, t) E • The enabling processor pe(t); from which last message arrives • Effective message arrival time  Te(t,p) = max { Tf(t’) + Tc(t’, t) } (t’, t) E , pt(t’) <> p • The start time of a ready task, once scheduled  Ts(t, p) = max { Te(t, p), Tr(p) } of 21

  9. General Framework for LSSP • General LSSP algorithm • Task’s priority computation, • O(E + V) • Task selection, • O(V log W) • Processor selection • O( (E + V) P) of 21

  10. General Framework for LSSP • Processor Selection • selecting a processor 1. The enabling processor 2. Processor becoming idle first  Ts(t) = max { Te (t, p), Tr ( p ) } of 21

  11. General Framework for LSSP • Lemma 1.  p <> pe(t) : Te (t, p) = Tm(t) • Theorem 1. t is a ready task, one of the processors p  {pe(t), pr } satisfies  Ts (t, p) = min Ts(t, px), px  P • O( (E + V) P )  O (V log (P) + E ) • O (E + V) to traverse the task graph • O (V log P) to maintain the processors sorted of 21

  12. General Framework for LSSP • Task Selection • O (V log W) can be reduced by sorting only some of the tasks. • Task priority queue 1. A sorted list of size H 2. A FIFO list ( O ( 1 ) ) • decreases to O(V log H) • H needs to be adjusted • H = P is optimal ( O ( V log P ) ) of 21

  13. Complexity Analysis • Computing task priorities O ( E + V ) • Task selection O ( V log W )  O ( V log H ) for partially sorted priority queue  O ( V log (P) ) for queue of size P • Processor Selection O (E + V)  O (V log P) • Total complexity  O ( V ( log (W) + log (P) ) + E) fully sorted  O ( V ( log (P) + E ) partially sorted of 21

  14. t0 / 2 1 1 4 t1 / 2 t2 / 2 t3 / 2 1 4 1 3 2 t4 / 3 t5 / 3 t6 / 2 2 1 3 t7 / 2 Case Study • MCP (Modified Critical Path) • The task having the highest bottom level has the highest priority • FCP (Fast Critical Path) • 3 Processors • Partially sorted priority queue of size 2 • 7 tasks of 21

  15. t0 / 2 1 1 4 t1 / 2 t2 / 2 t3 / 2 1 4 1 3 2 t4 / 3 t5 / 3 t6 / 2 2 1 3 t7 / 2 Case Study of 21

  16. Extensions for LSDP • Extend the approach to dynamic priorities  ETF : ready task starts the earliest  ERT : ready task finishes the earliest  DLS : task-processor having highest dynamic level • General formula   (t, p) = ( t ) + max { Te (T, p), Tr (p) } • ETF ( t ) = 0 • ERT ( t ) = Tw( t ) • DLS ( t ) = - Tb(t) of 21

  17. Extensions for LSDP • EP case • on each processor, the tasks are sorted • the processors are sorted • non-EP case • the processor becoming idle first • if this is EP, it falls to the EP case of 21

  18. Extensions for LSDP • 3 tries; • 1 for EP case, 1 for non-EP case • Task priority queues maintained; • P for EP case, 2 for non-EP case • Each task is added to 3 queues; • 1 for EP case, 2 for non-EP case • Processor queues; • 1 for EP case, 1 for non-EP case of 21

  19. Complexity • Originally O ( W ( E + V ) P ) now O ( V (log (W) + log (P) ) + E ) can be further reduced using partially sorted priority queue. A size of P is required to maintain comparable performance O ( V log (P) + E ) of 21

  20. Conclusion • LSSP can be performed at a significantly lower cost... • Processor selection between only two processors; enabling processor or processor becoming idle first • Task selection, only a limited number of tasks are sorted • Using the extension of this method, LSDP complexity also can be reduced • For large program and processor dimensions, superior cost-performance trade-off. of 21

  21. Thank You Questions? of 21

More Related