1 / 44

Embedded Computer Architecture 5KK73 MPSoC

Embedded Computer Architecture 5KK73 MPSoC. Controlling the Parallel Resources. flexibility. efficiency. DSP. Programmable CPU. Programmable DSP. Application specific instruction set processor (ASIP). Application- specific processor. Contents. GPUs revisited

jovita
Download Presentation

Embedded Computer Architecture 5KK73 MPSoC

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Embedded Computer Architecture5KK73MPSoC Controlling the Parallel Resources

  2. flexibility • efficiency • DSP • Programmable • CPU • Programmable • DSP • Application specific • instruction set • processor (ASIP) • Application- • specific processor

  3. Contents GPUs revisited PicoChip Real-Time Scheduling basics Resource Management 3

  4. GPU basics Synthetic objects are represented with a bunch of triangles (3d) in a language/library like OpenGL or DirectX plus texture Triangles are represented with 3 vertices A vertex is represented with 4 coordinates with floating-point precision Objects are transformed between coordinate representations Transformations are matrix-vector multiplications 4

  5. GPU DirectX 10 pipeline 5

  6. NVIDIA GeForce 6800 3D Pipeline 6

  7. GeForce 8800 GPU 7 330 Gflops, 128 processors with 4-way SIMD

  8. GPU: Why more general-purpose programmable? All transformations are shading Shading is all matrix-vector multiplications Computational load varies heavily between different sorts of shading Programmable shaders allow dynamic resource allocation between shaders Result: Modern GPUs are serious competitor for general-purpose processors! 8

  9. Pico Chip

  10. Pico Chip

  11. Pico Chip

  12. Fault-Tolerance

  13. Pico Chip

  14. Real-time systems (Reinder Bril) • Correct result at the right time: timeliness • Many products contain embedded computers, e.g. cars, planes, medical and consumer electronics equipment, industrial control. • In such systems, it’s important to deliver correct functionality on time. • Example: inflation of an air bag

  15. Cable modem DVB Tuner IEEE 1394 interface RF Tuner CVBS interface YC interface VGA DVD CDx front end Example: Multimedia Consumer Terminals (by courtesy of Maria Gabrani)

  16. up-scaled Example: High quality video & real time TV companies invest heavily in video enhancement,e.g. temporal up-scaling Input stream: 24 Hz (movie) original Rendered stream: 60 Hz (TV screen)

  17. up-scaled displayed Example: High quality video & real time TV companies invest heavily in video enhancement,e.g. temporal up-scaling Input stream: 24 Hz (movie) original • Deadline miss leads to “wrong” picture. • Deadline misses tend to come in bursts (heavy load). • Valuable work may be lost.

  18. Real-time systems • Guaranteeing timeliness requirements: • real-time tasks with timing constraints • scheduling of tasks • Fixed-priority scheduling (FPS) is the de-facto standard for scheduling in real-time systems. • FPS: supported by • commercially available RTOS; • analytic and synthetic methods.

  19. Recap of FPS • Fixed Priority Pre-emptive Scheduling (FPPS) • A basic scheduling model • Analysis • Example • Optimality of RMS and DMS

  20. FPPS: A basic scheduling model • Single processor • Set of n independent, periodic tasks 1, …, n • Tasks are assigned fixed priorities, and can be pre-empted instantaneously. • Scheduling: At any moment in time, the processor is used to execute the highest priority task that has work pending.

  21. FPPS: A basic scheduling model • Task characteristics: • period T, • (worst-case) computation time C, • (relative) deadline D, • Assumptions: • non-idling; • context switching and scheduling overhead is ignored; • execution of releases in order of arrival; • deadlines are hard, and D T; • 1 has highest and n has lowest priority. • No data-dependencies between tasks

  22. 1 2 3 4 5 6 1 2 3 time 0 10 20 30 40 50 60 WR1 = 3 WR2 = 17 WR3 = 56 FPPS: Example • Worst-case response time WR for task 3: First point in time that 1, 2, and 3 are finished Task 1 Task 2 Task 3

  23. FPPS: Analysis • Schedulable iff:WRi Di for 1  i  n • Necessary condition: • Sufficient condition for RMS:ULL(n) = n (21/n – 1), i.e. ri >rj iff Ti < Tj;Di = Ti.

  24. FPPS: Analysis • Otherwise, • i.e. U  1 and not RMS, or • n(21/n – 1) < U < 1 and RMS • exact condition: • Critical instant: simultaneous release of i with all higher priority tasks • WRi is the smallest positive solution of

  25. FPPS: Example • Task set Γ consisting of 3 tasks: • Notes: • RM priority assignment and Di = Ti(RMS); • U1 + U2 + U3 = 0.97  1, hence Γcould be schedulable; • Utilization bound: U(n) LL(n) = n (21/n – 1): • U1+U2 = 0.88 > LL(2)  0.83, • therefore U(3) > LL(3), hence another test required.

  26. 1 2 3 4 5 6 1 2 3 time 0 10 20 30 40 50 60 WR1 = 3 WR2 = 17 WR3 = 56 FPPS: Example • Time line Task 1 Task 2 Task 3

  27. FPPS: Optimality of RMS and DMS • Priority assignment policies: • Rate Monotonic (RM): ri >rj iff Ti < Tj • Deadline Monotonic (DM): ri >rj iff Di < Dj • Under arbitrary phasing: • RMS is optimal among FPS when Di = Ti; • DMS is optimal among FPS when DiTi, • where optimal means: if an FPS algorithm can schedule the task set, so can RMS/DMS.

  28. FPPS not suitable for multimedia multiprocessor!! Assumptions: • context switching and scheduling overhead is ignored; No longer true • deadlines are hard, and D T; No longer true • 1 has highest and n has lowest priority: No prorities • No data-dependencies between tasks: not true • Single processor: not true

  29. Task Non-Preemptive Systems (Akash Kumar) • State-space needed is smaller • Lower implementation cost • Less overhead at run-time • Cache pollution, memory size

  30. Why FPS doesn’t work for “future” high-performance platforms • Heavy-duty DSPs: Preemption not supported • If it was: Context switching is significant • Data-dependencies not taken into account • Multi-processor

  31. Related Research – Feasibility Analysis Preemptive [Liu, Layland, 1973] B A D [Jeffay, 1991] Non-Preemptive C Homogeneous MPSoC [Baruah, 2006] P1 P2 P3 P4 P5 P6 [ , 2020??] Heterogeneous MPSoC

  32. 50 49 50 49 49 50 50 49 A A B B 50 49 49 50 Unpredictability – Variation in Execution Time P1 P2 P3

  33. Problem No good techniques exist to analyze and schedule applications on non-preemptive heterogeneous systems Resource Manager proposed to schedule applications such that they meet their performance requirements on non-preemptive heterogeneous systems

  34. B2 A2 D2 C2 Task Job Our Assumptions • Heterogeneous MPSoC • Applications modeled as SDF • Non-preemptive system – tasks can not be stopped • Jobs can be suspended • Lot of dynamism in the system • Jobs arriving and leaving at run-time • Variation in execution time • Very simple arbiter at cores

  35. Application QoS Manager Application level few sec Reconfigure to meet above quality milliseconds Resource Manager B A Local Processor Arbiter Task level micro sec Core Resource Manager

  36. Resource Manager Local Arbiter P1 P2 P3 Architecture Description • Computation resources available are described • Each processor can have different arbiter • In this model First Come First Serve mechanism is used • Resource manager can configure/control the local arbiters: to regulate the progress of application if needed

  37. Resource Manager • Responsible for two main things • Admission control • Incoming application specifies throughput requirement • Execution-time and mapping of each actor • Repetition vector used to compute expected utilization • RM checks if enough resources present • Allocates resources to applications if admitted

  38. Video Conf Play MP3 Typing Sms P1 Admission Control Resource Reqmt Exceeded! P2 P3

  39. Resource Manager • Admission control • Budget enforcement • When running, each application signals RM when it completes an iteration • RM keeps track of each application’s progress • Operation modes • ‘Polling’ mode • ‘Interrupt’ mode • Suspends application if needed

  40. Performance goes down! Better than required! Budget Enforcement (Polling) New job enters! Resource Manager job suspended! job resumed!

  41. Experiments • A high-level simulation model developed • POOSL – a parallel simulation language used • A protocol for communication defined • System verified with a number of application SDF models • Case study done with H263 and JPEG application models • Impact of varying ‘polling’ interval studied

  42. Performance without Resource Manager

  43. Performance with RM – I (2.5m cycles)

  44. Performance with RM – II (500k cycles)

More Related