1 / 20

Application Domains for Fixed-Length Block Structured Architectures

Application Domains for Fixed-Length Block Structured Architectures. ACSAC-2001 Gold Coast, January 30, 2001. Outline. Introduction Block Structured Architecture Methodology Results Conclusions. Introduction. Out-of-order architecture dynamically schedules independent instructions

Download Presentation

Application Domains for Fixed-Length Block Structured Architectures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Application Domains for Fixed-Length Block Structured Architectures ACSAC-2001 Gold Coast, January 30, 2001

  2. Outline • Introduction • Block Structured Architecture • Methodology • Results • Conclusions

  3. Introduction • Out-of-order architecture • dynamically schedules independent instructions • Higher ILP through • more powerful processor core • fast instruction delivery • But … this increases the hardware complexity significantly!

  4. Hardware complexity processor core instruction window O (n2) bypass logic long wires [Palacharla et al. 1996] register file many ports [Farkas et al. 1995] fetching fetch bandwidth multiple branches cache access

  5. Solutions processor core • decentralization: • trace processor [Rotenberg et al. ‘97] • multiscalar architecture • [Sohi et al. ‘95] • clusters (Alpha 21264) fetching • bigger units of work: • trace in trace processors • task in multiscalar architecture • block in block-structured ISA • [Melvin and Patt ‘95; Hao et al. ‘96]

  6. Basic idea of BSA • Fixed-Length Block Structured Architecture (BSA) • addresses • processor core problem • fetching problem • by appropriate microarchitectural and implementational • design decisions BSA is a feasible architectural paradigm for future processors

  7. BSA-block (p1) (~p1) basic block basic block (p2) (~p2) basic block basic block Block Structured Architecture overcoming the fetch problem • Advantages: • predication: elimination of unbiased branches • intra-block communication: less register file ports required • fixed-length BSA-blocks: easier fetching • Disadvantages: • BSA-block not always filled • higher memory bandwidths • bigger instruction caches • BSA-block compression basic block BSA-block is atomic unit of work • no control flow • predication • static register renaming • data-flow execution • fixed-length

  8. instruction cache fetch unit branch predictor block engine block engine block engine block engine FU1 FU2 data cache register file Block Structured Architecture overcoming the processor core problem fixed-length BSA-block speculative execution fast intra-block communication slow inter-block communication instruction window

  9. Decentralization (1) out-of-order architectures with higher levels of ILP: complex design wiring delay will dominate in future technologies • scaling out-of-order architectures • to higher levels of ILP • for future technologies • is infeasible decentralization small, and thus very fast, block engines communicating through longer, and thus slower, interconnects

  10. Decentralization (2) • lower IPC • slower interconnections (1 cycle latency) • bad virtual instruction window utilization • due to higher granularity • higher clock frequency F • decentralization • performance = IPC x F • higher performance for large virtual window sizes

  11. Outline • Introduction • Block Structured Architecture • Methodology • Results • Conclusions

  12. Statistical Modeling extraction of distributions benchmark trace: e.g. SPECint statictical profiler statistical profile: distributions 1 2 microarchitectural parameters 3 BSA-block size b trace-driven simulator synthetic trace synthetic trace generator 5 4 6 IPC

  13. Synthetic BSA-trace Generation generate control flow BSA-block 1 basic block actually executed • determine basic block size • add basic block to most likely execution path • until b instructions in BSA-block 0.65 0.35 2 basic block 4 basic block generate data flow • instruction type • number of operands • age of register operands 0.25 0.40 0.20 0.15 5 basic block 3 basic block • determine actually executed control flow path 0.20 0.05 0.20 0.20

  14. Benchmarks • SPECint95: integer • SPECfp95: floating-point • MediaBench: signal and multimedia processing • MPEG-4 like algorithms • measuring program characteristics through instrumentation (ATOM) on Alpha architecture

  15. Outline • Introduction • Block Structured Architecture • Methodology • Results • Conclusions

  16. Instruction Mix • Load/store instructions • SPECint95 40.6% • SPECfp95 37.7% • multimedia 29.2% • Branch instructions • SPECint95 14.0% • SPECfp95 3.6% • multimedia 8.5% • Some multimedia applications have floating-point instructions

  17. Control-intensitivity • Good measure: “Number of instructions between 2 mispredicted branches” = number of instructions between 2 branches branch misprediction rate • SPECint95 80.1 7.3 9.1% • SPECfp95 415.3 25.0 6.0% • multimedia 156.9 14.3 9.1%

  18. BSA-block formationnumber of useful instructions 100% 90% 80% fraction useful instructions 70% avg media avg SPECint95 60% avg SPECfp95 50% 16 32 64 128 BSA-block size

  19. BSA-block formationpredictability of multi-way branch multimedia integer floating-point 100% 90% 80% 70% 60% multi-way branch predictability 50% 40% 16-instruction block 30% 32-instruction block 20% 64-instruction block 10% 0% • 16-instruction block: 90% in most cases • 32-instruction block: low for several integer applications • 64-instruction block: only for floating-point applications

  20. Conclusions • Multimedia applications are less control-intensive than integer applications • due to larger basic block size under comparable branch predictability • Multimedia applications are more control-intensive than floating-point applications • due to smaller basic block size and lower branch predictability • 16 instructions per BSA-block is appropriate • larger blocks result in higher (multi-way) branch misprediction rates

More Related