1 / 12

Scalar Operand Networks: On-Chip Interconnect for ILP in Partitioned Architectures

Scalar Operand Networks: On-Chip Interconnect for ILP in Partitioned Architectures. Michael Bedford Taylor, Walter Lee, Saman Amarasinghe, Anant Agarwal Presented By: Sarah Lynn Bird. Scalar Operand Networks.

molly
Download Presentation

Scalar Operand Networks: On-Chip Interconnect for ILP in Partitioned Architectures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scalar Operand Networks: On-Chip Interconnect for ILP in Partitioned Architectures Michael Bedford Taylor, Walter Lee, Saman Amarasinghe, Anant Agarwal Presented By: Sarah Lynn Bird

  2. Scalar Operand Networks • “A set of mechanisms that joins the dynamic operands and operations of a program in space to enact the computation specified by a program graph” • Physical Interconnection Network • Operation-operand matching system

  3. Example Scalar Operand Networks Register File Raw Microprocessor

  4. Design Issues • Delay Scalability • Intra-component delay • Inter-component delay • Managing latency • Bandwidth Scalability • Deadlock and Starvation • Efficient Operation-Operand Matching • Handling Exceptional Events

  5. Operation-Operand Matching • 5-Tuples of Costs <SO, SL, NHL, RL, RO> • SO: Send Occupancy • The number of cycles that the ALU wastes in sending • SL: Send Latency • The number of cycles of delay for the message on the send side of the network • NHL: Network Hop Latency • The number of cycles of delay per hop • RL: Receive Latency • The number of cycles of delay between the final input arrives and the instruction is consumed • RO: Receive Occupancy • The number of cycles that an ALU wastes before employing a remote value

  6. Raw Design • 2 Static Networks • Instructions from a 64KB cache • Point-to-point for operand transport • 2 Dynamic networks • Memory traffic, interrupts, user-level messages • 8 -stage in-order single-issue pipeline • 4-stage pipelined FPU • 32KB data cache • 32KB instruction cache • 16 Cores on a Chip

  7. Experiments • Beetle: a cycle-accurate simulator • Actual Scalar Operand Network • Parameterized Scalar Operand Network without Contention • Data cache misses modeled correctly • Assume no instruction cache misses • Memory Model • Compiler maps memory to tiles • Each location has one home site • Benchmarks • From Spec92, Spec95, Raw benchmark suite • Dense Matrix Codes, 1 Secure Hash Algorithm

  8. Benchmark Scaling • Benchmark speedups on many tiles relative to the speed of the benchmark on one tile

  9. Effect of Send & Receive Occupancy • 64 tiles • Parameterized network without contention • <n,1, 1, 1, 0> & <0,1,1,1, n>

  10. Effect of Send or Receive Latencies • Applications with courser-grain parallelism are less sensitive to send/receive latencies • Overall, applications are less sensitive to send/receive latencies as compared with send/receive occupancies.

  11. Other Experiments • Increasing Hop Latency • Removing Contention • Comparing with Other networks

  12. Conclusions • Many difficult issues with designing scalar operand networks • Send and receive occupancies have the biggest impact on performance • Network contention, multicast, and send/receive latencies have a smaller impact

More Related