1 / 20

Scenario-Oriented Design for Single Chip Heterogeneous Multiprocessors

Scenario-Oriented Design for Single Chip Heterogeneous Multiprocessors. JoAnn M. Paul Electrical and Computer Engineering Department Carnegie Mellon University Pittsburgh, PA Presented by: Mohammad Farsakh. What is this paper about.

ellery
Download Presentation

Scenario-Oriented Design for Single Chip Heterogeneous Multiprocessors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scenario-Oriented Design for Single Chip Heterogeneous Multiprocessors JoAnn M. PaulElectrical and Computer Engineering DepartmentCarnegie Mellon UniversityPittsburgh, PA Presented by: Mohammad Farsakh

  2. What is this paper about • Challenges of selection, programming, and coordination for future single chip computers designs • Consisting of processing elements (PEs) • Heterogeneous type • Outlines the differences between • Next generation single chip systems designs • Traditional designs. • Focus on Scenario-Oriented Design (SOD) strategy • Applications, schedulers, and hardware viewed as a system • Leveraging one against the other. • Reducing the modeling detail of each design domain within a system in high level simulation.

  3. Introduction • The design process of digital computation has three categories • Models • Tools • Strategies • Existing models, tools and strategies are failing to permit designers of single chip computers, to efficiently capture the design space at hand. • Tools do not capture software as part of the system model. • Instruction Set Simulators (ISS) are too detailed and slow to capture systems with many processors • Designers are left to their own devices, which limits the effective realization of many potentially significant designs. • Models, tools and strategies for both the software and hardware of single chip heterogeneous multiprocessors is required.

  4. Introduction (cont) • The future individual, programmable processors will be like registers in a larger framework. • Processor blocks will be differentiated by • Capabilities of the hardware in the processors, • The way they are programmed, • Their manner of interconnection. • Should maximize the ratio DQ/DE for a given design to initiate the next design level • DQ = Design Quality • DE = Development Effort • A common basis for design, at a modeling level is required to manipulate the design decisions that has the most impact.

  5. Programmable HeterogeneousMultiprocessors (PHMs) • Collections of PEs must be considered programmable • The chip is a programmable collection of processors grouped dynamically. • Different design challenges in heterogeneous multiprocessors because three primary reasons: • A single chip is a finite resource, unlike wide-area networks. • The design will be semi-custom. • Under hardware more customized to the application space than traditional programmable system • Traditional heterogeneous multiprocessors , provide transaction-like services on a diverse collection of resources • Single chip devices such as SoCs are customized to meet fixed latency requirements as a reactive system. • PHMs will be semi-custom and have aspects of both design styles. • Coordination of system resources is required. • The large differential in on-chip vs. off-chip communications will force efficient utilization and management of on-chip system resources — including processing elements, memory, communications bandwidth and chip I/O.

  6. Design Environment of Single Chip PHM • H , Single Chip heterogeneous multiprocessor. • Data Inputs • DP, time stamped system inputs that are conceptually presented to the system hardware on I/O pins. • DM, data values reside in some external memory. Analogous to jobs, packets or other requests in a queue waiting to be processed by H. • Programs • BC, clocked benchmarks programs with fixed latency requirements with required latency specified to a fixed time reference. Designed to meet the worst-case demands that are presented to the system by DP. Programs have fixed performance requirements. • BI, programmatic inputs benchmarks for which performance is calculated by the internal timing of the processing capabilities of the design. This run over many PEs. • BX, schedulers programs, that acts as a means of resolving the other benchmarks to the architecture.

  7. Design Output • Single output Q, has the quality metric of the design including the performance for the two classes of behaviors (BC and BI) • General form of such environment E = {D, B, H, Q} • In case of E = {BC, DP, Q} • Pass/fail Quality metric • Fully specified by DP and BC and not a separately performance-evaluated architecture. • Hardware Description Language (HDL) • In case of E = {BC, BX, DP, H, Q} where H is a single processor • Pass/fail Quality metric • Kind of analytical modeling typical of research in real-time operating systems (RTOSs). • RTOS • In case of E = {BI, DM, H, Q} • H is a single processor executing at the instruction set simulator level or below • It is typical of simulators such as Simple scalar used to model a micro architecture or ISA. • Simple ISS • Complexity of the application space • Current day approach ISS, can’t permit effective exploration of the design space • Complete level of detail required in the model • Takes long time to generate any single value of Q.

  8. Scenario-Oriented Design • A novel design strategy • Orients heterogeneous multiprocessor single chip design according to a blend of performance requirements, • Implemented in new chip-wide programmer’s views. • Leverages increased heterogeneity in the future application space • Results in greater efficiency in design process and Resource utilization.

  9. Fixed performance (FP) • To meet the current systems requirement for system with Dp and BC current system must be overdesigned for two reasons: • The capacity of system resources is wasted, with the time taken to matching functionality to available processing power, to make sure that the WC behavior is met. • The irregular loading situations and data dependent processing times contribute to underutilized processing resources except in peak loading situations with WC.

  10. Throughput performance (TP) • Bi designed to be a broad representative set of program types used to evaluate and optimize a programmable device’s throughput performance (TP). • Optimize a common case (CC) instead of ensuring that WC behaviors are met. • Like network switches dropping packets presumed to be resent. • Applies to caches, branch predictors, OS scheduling strategies

  11. Future Vs. Current Designs • Two design strategies are worlds apart. • worst case (WC) with fixed performance • common case (CC) with throughput performance (TP) • Future single chip designs • Execute a mix of the BC and BI to handle a mix of DP and DM • FP behaviors are met • CC behaviors are optimized. • Currently, systems with FP and TP performance oriented design • Separated into different devices • General purpose programming resides on the general purpose processor, • Other processors utilize individual RTOSs to ensure WC behaviors are met, or WC behaviors are ensured by implementation in custom hardware.

  12. Layered, SOD approach to SoC Design • SOD can satisfy performance for FP functionality and provide a basis for a TP-optimized remainder architecture. • Hardware architecture and a remainder architecture are co-designed. • Map the FP functionality across the entire chip, consuming part of the proposed architecture • Leverages the presence of both classes • Optimize design time • Optimize design quality • Measuring exact execution times for FP is not required at the start of design • Hardware architecture and a remainder architecture are co-designed.

  13. SoC Hardware View • Different Processing Elements (PE) • Different functionality • Common communication channel

  14. SoC With Remainder Architecture • Software partitioning • PE divided to two parts • PE = {F-I,R-i} • COMM ={R-COMM, F-COMM} • Functional Overlay, {F-i}, BC to Processing resources • Remainder architecture carry BI, R = {R-I,R-COMM}

  15. Layered, SOD approach to SoC Design • New layer between R-i and F-I • Enlarge the boundary between performance group partitions • Reduce design time • FP mapped to chip need not be known beforehand • Optimize TP • SOD partitioning produces a chip-wide, horizontal view • Hardware resources in the bottom layer • Schedulers in the middle layer ( permit the co-operation of …. ) • General software at the top layer • Last two layers could have multiple internal layers • Layering concept, leverages schedulers as a basis for a soft partitioning of a hardware design.

  16. Simulation Foundation — MESH • The Modeling Environment for Software and Hardware (MESH) is a good simulator • Provide a layered modeling basis above ISS models • Use schedulers to model concurrent, high level software running on high level models of processor resources. • Resolve the timing through design layers where unrestricted software executes on hardware models without relying upon ISS

  17. Modeling Environment for Software and Hardware (MESH) • ThLij — One of j logical threads (software) that will execute on processor i. • ThPi — A model of the ith physical resource in the system, such as a processor. • UPi — A scheduler that selects logical threads intended to execute on resource ThPi. • ULi — A logical scheduler that can schedule M threads to N resources. e.g., a pthread scheduler

  18. How Mesh Works • Dynamic number of logical threads • Execution is scheduled onto a single resource • Scheduling decisions based on the state of the threads and other system state. • Resolves the logical events of the software threads to physical timing • Schedulers serve two roles: • Modeling scheduling decisions, • Resolving logical computation to physical time. • Complex system have many resources (ThPi) • Two dimensions of scheduling: • Based on physical time • Based logical state. • M threads may dynamically mapped to N resources • 2.5 times faster than an internal ISS level simulator

  19. Conclusion • Challenges for future designs • Performance • Power • Chip size • Future computer design should be evaluated as a system • SOD: strategy result from considering applications, schedulers, and hardware as they interact to form a system • Leveraging each against the other • Reducing modeling details

  20. Thanks

More Related