1 / 96

Buffer Capacity Computation for Throughput Constrained Streaming Applications with Data-Dependent Inter-Task Communicati

This research paper discusses the computation of buffer capacity for streaming applications with data-dependent inter-task communication in multi-processor architectures. It explores the challenges and proposes a model to guarantee throughput while considering input-data dependent behavior and run-time arbitration.

rmerkley
Download Presentation

Buffer Capacity Computation for Throughput Constrained Streaming Applications with Data-Dependent Inter-Task Communicati

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Buffer Capacity Computation for Throughput Constrained Streaming Applications with Data-Dependent Inter-Task Communication Maarten Wiggers PhD student, University of Twente, NL Co-author and supervisor: Marco Bekooij, NXP Semiconductors Research Gerard Smit, University of Twente

  2. Outline • Context • Streaming applications • Programming multiprocessor architectures • Problem • Problem statement • Related work • Variable Rate Dataflow • Chain topology • Arbitrary graph topology • Experiment • Conclusion [Wiggers – DATE 2008, Wiggers – RTAS 2008] Maarten Wiggers -- University of Twente

  3. Outline • Context • Streaming applications • Programming multiprocessor architectures • Problem • Problem statement • Related work • Variable Rate Dataflow • Chain topology • Arbitrary graph topology • Experiment • Conclusion Maarten Wiggers -- University of Twente

  4. Multi-stream car-entertainment system Maarten Wiggers -- University of Twente

  5. Application model • Jobs process streams of data • Jobs are composed of tasks • Simultaneously running jobstogether form use-cases • Jobs often have real-time requirements • Firm (FRT) if deadline misses arehighly undesirable (steep quality degradation) use-case use-case FRT video job task input data stream task task output stream to display input data stream task task output stream to speakers FRT audio job Maarten Wiggers -- University of Twente

  6. Task graphs • Jobs are implemented as task graphs • Tasks communicate fixed-sized containers over fixed-sized FIFO buffers • Container is a place-holder for data • Task has random access in container • Task only starts an execution on sufficient • Full containers in input buffers • Empty containers in output buffers (back-pressure)‏ • Backpressure robustly prevents buffer overflow • Required quanta of containers can be • Known at design-time • Dependent on the actual processed stream Maarten Wiggers -- University of Twente

  7. Example job – MP3 playback • MP3 decoding task consumes a variable number of bytes per frame • Every execution a different number of bytes consumed • BR task executes a-periodically • No static-order schedule for BR and MP3  run-time arbitration • Throughput constraint : sink needs to execute strictly periodically • All tasks are pushing data towards the sink • For sufficiently large buffers, sink can execute strictly periodically n=[0,960] Maarten Wiggers -- University of Twente

  8. Example job – H.263 video decoder • Variable length decoder (VLD) consumes a variable number of bytes per frame • VLD produces a variable number of blocks per frame • DQ and IDCT process blocks • Motion compensator assembles a frame from blocks • Throughput constraint : sink needs to execute strictly periodically m=[0,6536] n=[0,2376] Maarten Wiggers -- University of Twente

  9. Application trend • Behaviour of applications is increasingly input-data dependent, e.g. • Entropy encoding • Adaptation to channel conditions by digital radio’s • Reflected in • Input-data dependent execution times • Conditional execution of code • Mode changes • Input-data dependent execution rates • Input-data dependent execution rates requires run-time arbitration Maarten Wiggers -- University of Twente

  10. Trend  challenge • Required properties • Functionally deterministic behaviour: output values completely determined by input values • Deadlock free • Throughput constraint satisfied • Research challenge is to define models • For which required properties are decidable • Can model applications with input-data dependent behaviour • Include effects of run-time arbitration • E.g. Variable-Rate Dataflow Maarten Wiggers -- University of Twente

  11. Multi-processor architecture template • Multi-processor system required for performance and power reasons External SDRAM P DSP I/O $ ctrl mem Arb CA NI NI NI NI Network-on-Chip [Hansson – TODAES 2008] Maarten Wiggers -- University of Twente

  12. Compute settings multiprocessor instance (cyclic) task graph WCET throughput and latency constraint Dataflow synthesis scheduler settings and buffer capacities Maarten Wiggers -- University of Twente

  13. Compute settings • Guarantees on end-to-end throughput requires guarantees on deadlock-freedom • Models that provide end-to-end throughput guarantees are not Turing complete • Poses restrictions on • Applications : e.g. inter-task synchronisation behaviour • Architectures : e.g. applicable run-time arbitration schemes • Goal: define a model that can guarantee throughput for H.263 Maarten Wiggers -- University of Twente

  14. Example • Every execution, task B can choose to consume either 2 or 3 • Required buffer capacity for deadlock freedom? Maarten Wiggers -- University of Twente

  15. Example (cont.)‏ • Attempt : assume maximum consumption quantum in every execution • Requires buffer capacity of 3 for deadlock freedom Maarten Wiggers -- University of Twente

  16. Example (cont.)‏ • However, when consuming the minimum quantum • Buffer capacity of 3 is insufficient! Maarten Wiggers -- University of Twente

  17. Example (cont.)‏ Maarten Wiggers -- University of Twente

  18. Example (cont.)‏ Maarten Wiggers -- University of Twente

  19. Example (cont.)‏ Deadlock! Maarten Wiggers -- University of Twente

  20. Outline • Context • Streaming applications • Programming multiprocessor architectures • Problem • Problem statement • Related work • Variable Rate Dataflow • Chain topology • Arbitrary graph topology • Experiment • Conclusion Maarten Wiggers -- University of Twente

  21. Problem • Compute buffer capacities • Guarantee satisfaction of throughput constraint • Tasks can require data-dependent quantum of data and space per execution Maarten Wiggers -- University of Twente

  22. Problem • Compute buffer capacities • Guarantee satisfaction of throughput constraint • Tasks can require data-dependent quantum of data and space per execution • Assumptions • Run-time arbitration on shared resources • Upper and lower bounds on transferred quanta • Upper bound on execution time • Throughput constraint: sink or source that executes strictly periodically Maarten Wiggers -- University of Twente

  23. Related work • Quasi static-order scheduling • Transfer quanta change only after (sub) graph iterations • For every iteration a static-order schedule computed • Bounded memory is decidable • Models are amenable for code-synthesis • Examples • Heterochronous Dataflow [Girault – TCAD 1999] • Parameterised Dataflow [Bhattacharya – TSP 2001] • Requirement on changes only after graph iterations is a global requirement • Iteration is a graph property • VLD parses stream and decides next quantum locally • Static order scheduling excludes overlapped schedules of graphs with different transfer quanta Maarten Wiggers -- University of Twente

  24. Requirements on quanta change Maarten Wiggers -- University of Twente

  25. Requirements on quanta change Maarten Wiggers -- University of Twente

  26. Requirements on quanta change Quasi static-order scheduling: 2*A and 3*B before change Maarten Wiggers -- University of Twente

  27. Requirements on quanta change Variable-Rate Dataflow: can change every firing Maarten Wiggers -- University of Twente

  28. Related work • Variable token sizes instead of variable number of transferred tokens • [Sen – ASSP 2005] • Experiment will show that this results in larger buffers • Variable consumption quantum by VLD depends on processed stream • BR task is unaware of the semantics of the stream  cannot know quantum Maarten Wiggers -- University of Twente

  29. Related work • Variable token sizes instead of variable number of transferred tokens • [Sen – ASSP 2005] • Experiment will show that this results in larger buffers • Variable consumption quantum by VLD depends on processed stream • BR task is unaware of the semantics of the stream  cannot know quantum Maarten Wiggers -- University of Twente

  30. Related work • Run-time arbitration • Not required to compute schedules at design-time • Only need to show that for all transfer quanta a schedule exists • State-of-the-art • Real-time calculus (group of Thiele at ETH Zurich) • Symta/S (group of Ernst at TU Braunschweig) • These approaches have • Difficulties with cyclic dependencies that influence the temporal behaviour • No means to reason about bounded memory or deadlock properties • E.g. no concept similar to consistency Maarten Wiggers -- University of Twente

  31. Outline • Context • Streaming applications • Programming multiprocessor architectures • Problem • Problem statement • Related work • Variable Rate Dataflow • Chain topology • Arbitrary graph topology • Experiment • Conclusion Maarten Wiggers -- University of Twente

  32. Phase 1 • Next slides discuss buffer capacity computation in case of chain topology Maarten Wiggers -- University of Twente

  33. Phase 1 and 2 • Next slides discuss buffer capacity computation in case of chain topology • Subsequent slides discuss extension to graphs Maarten Wiggers -- University of Twente

  34. Variable Rate Dataflow (by example) Implementation = Task graph Model = Dataflow graph Maarten Wiggers -- University of Twente

  35. Task graph Tasks Buffers Tasks Have a bounded response time Consume and produce data between start and finish Buffers have a finite and fixed capacity Dataflow graph Actors Queues Actors Have a fixed response time Consume tokens atomically at the start Produce tokens atomically at the finish Queues have infinite depth Variable Rate Dataflow Maarten Wiggers -- University of Twente

  36. Execution time  response time time-slice period Maarten Wiggers -- University of Twente

  37. Execution time  response time time-slice period Explained in detail in [Wiggers – RTAS 2007] Generalisation that includes all starvation-free schedulers in [Wiggers – SCOPES 2007] Maarten Wiggers -- University of Twente

  38. Task graph Tasks Buffers Tasks Have a bounded response time Consume and produce data between start and finish Buffers have a finite and fixed capacity Dataflow graph Actors Queues Actors Have a fixed response time Consume tokens atomically at the start Produce tokens atomically at the finish Queues have infinite depth Variable Rate Dataflow Input specification Analysis vehicle Maarten Wiggers -- University of Twente

  39. Approach • Model task graph on architecture by Variable-Rate Dataflow graph • Let actor vτ model the throughput constraining task • Compute sufficient number of tokens to enable actor vτ to execute strictly periodically • Computed number of tokens equals required buffer capacity • One-to-one correspondence • Containers in task graph – tokens in dataflow graph • Enabling condition task – firing rule actor • Containers consumed and produced – tokens consumed and produced • Execution times of actors are upper bound on execution times of tasks • Self-timed execution of Variable-Rate Dataflow is temporallymonotonic Maarten Wiggers -- University of Twente

  40. Monotonic temporal behaviour • VRDF actors have sequential firing rules [Lee – 1995] • The number of tokens that is required to be present on inputs is completely determined by already consumed tokens • VRDF actors are functional • The produced tokens are a function of the consumed tokens • Given self-timed execution. If a token arrives earlier on an input, then • This can only lead to an earlier satisfaction of the firing rule, and • This can only lead to an earlier production of the same tokens • E.g. a smaller response time of a VRDF actor cannot lead to any later token arrival time • Because of scheduling anomalies this is not true for the task graph! • A smaller response time can lead to later container arrival times • Token arrival times conservatively bound container arrival times Maarten Wiggers -- University of Twente

  41. Approach – computation of suff. tokens • Find valuation of token transfer parameters that lead to maximum required token transfer rates • On each edge, take maximum required rate as the slope of • A linear upper bound on token production times, and • A linear lower bound on token consumption times Derive offset of linear bounds such that for all sequences of transfer quanta there exists a schedule for which bounds are conservative • Offset is relative to start of first firing of actor • Use linear bounds to compute sufficient number of initial tokens • This number of tokens is also sufficient for smaller transfer rates Maarten Wiggers -- University of Twente

  42. Approach – computation of suff. tokens • Find valuation of token transfer parameters that lead to maximum required token transfer rates • On each edge, take maximum required rate as the slope of • A linear upper bound on token production times, and • A linear lower bound on token consumption times Derive offset of linear bounds such that for all sequences of transfer quanta there exists a schedule for which bounds are conservative • Offset is relative to start of first firing of actor • Use linear bounds to compute sufficient number of initial tokens • This number of tokens is also sufficient for smaller transfer rates Maarten Wiggers -- University of Twente

  43. Approach – step 1 • Determine on each edge the maximum required transfer andfiring rates • Sink has to fire strictly periodically • Maximum required transfer rate on edge for • Maximum consumption quantum • Maximum required firing rates of A for • Minimum production quantum Maarten Wiggers -- University of Twente

  44. Approach – step 1 • Determine on each edge the maximum required transfer andfiring rates • Sink has to fire strictly periodically • Maximum required transfer rate on edge for • Maximum consumption quantum • Maximum required firing rates of A for • Minimum production quantum Maarten Wiggers -- University of Twente

  45. Approach – computation of suff. tokens • Find valuation of token transfer parameters that lead to maximum required token transfer rates • On each edge, take maximum required rate as the slope of • A linear upper bound on token production times, and • A linear lower bound on token consumption times Derive offset of linear bounds such that for all sequences of transfer quanta there exists a schedule for which bounds are conservative • Offset is relative to start of first firing of actor • Use linear bounds to compute sufficient number of initial tokens • This number of tokens is also sufficient for smaller transfer rates Maarten Wiggers -- University of Twente

  46. Approach – step 2 • Given linear bounds on production and consumption times • Find difference between bounds that allows existence of schedule for all sequences of quanta Actor starts at t=0 Consumes tokens at start Produces tokens at finish Finish – start = response time Maarten Wiggers -- University of Twente

  47. Approach – step 2 • Given linear bounds on production and consumption times • Find difference between bounds that allows existence of schedule for all sequences of quanta Actor starts at t=0 Consumes tokens at start Produces tokens at finish Finish – start = response time Larger quantum  larger difference between bounds Maarten Wiggers -- University of Twente

  48. Approach – step 2 • Given linear bounds on production and consumption times • Find difference between bounds that allows existence of schedule for all sequences of quanta Actor starts at t=0 Consumes tokens at start Produces tokens at finish Finish – start = response time Larger quantum  larger delay next start time If largest quantum betweenbounds, then every sequencebetween bounds Maarten Wiggers -- University of Twente

  49. Approach – computation of suff. tokens • Find valuation of token transfer parameters that lead to maximum required token transfer rates • On each edge, take maximum required rate as the slope of • A linear upper bound on token production times, and • A linear lower bound on token consumption times Derive offset of linear bounds such that for all sequences of transfer quanta there exists a schedule for which bounds are conservative • Offset is relative to start of first firing of actor • Use linear bounds to compute sufficient number of initial tokens • This number of tokens is also sufficient for smaller transfer rates Maarten Wiggers -- University of Twente

  50. Approach – step 3 • Difference between linear bounds is buffer capacity Buffer capacity is maximum difference between tokens consumed and produced Maarten Wiggers -- University of Twente

More Related