1 / 9

MacSim Architecture Studies

MacSim Architecture Studies. Architecture Studies Using MacSim. Thread fetch policies Branch predictor. Software and Hardware prefetcher Cache studies (sharing, inclusion) DRAM scheduling Interconnection studies. Power model. Front-end. Memory System. Misc. Prefetcher Study. MacSim.

otylia
Download Presentation

MacSim Architecture Studies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MacSim Architecture Studies MacSim Tutorial (In ISCA-39, 2012)

  2. Architecture Studies Using MacSim • Thread fetch policies • Branch predictor • Software and Hardware prefetcher • Cache studies (sharing, inclusion) • DRAM scheduling • Interconnection studies • Power model Front-end Memory System Misc. MacSim Tutorial (In ISCA-39, 2012)

  3. Prefetcher Study MacSim Trace Generator (PIN, GPUOCelot) Frontend Memory System Software prefetch instructions PTX  prefetch, prefetchu x86  prefetcht0, prefetcht1, prefetchnta Hardware prefetch requests Hardware Prefetcher Stream, stride, GHB, … • Many-thread Aware Prefetching Mechanism [Lee et al. MICRO-43, 2010] • When prefetching works, when it doesn’t, and why [Lee et al. ACM TACO, 2012] MacSim Tutorial (In ISCA-39, 2012)

  4. Cache and NoC Studies $ $ $ $ $ $ $ Private Caches Interconnection Interconnection Shared $ Shared Cache • TLP-Aware Cache Management Policy [Lee and Kim, HPCA-18, 2012] Cache studies – sharing, inclusion property On-chip interconnection studies MacSim Tutorial (In ISCA-39, 2012)

  5. Heterogeneity Aware NoC • Heterogeneous link configuration CPU GPU MC Ring Network Different topologies L3 C C M M C C M M C0 C1 C2 G0 G1 G2 C C G G M1 M0 L3 L3 L3 L3 C0 G0 C2 G1 C1 G2 C C G G M1 M0 L3 L3 L3 L3 • On-chip Interconnection for CPU-GPU Heterogeneous Architecture [Lee et al. under review] MacSim Tutorial (In ISCA-39, 2012)

  6. Instruction Fetch and DRAM Scheduling Trace Generator (GPUOCelot) Frontend RR, ICOUNT, FAIR, LRF, … Execution DRAM FCFS, FRFCFS, FAIR, … • Effect of Instruction Fetch and Memory Scheduling on GPU Performance [Lakshminarayana and Kim, LCA-GPGPU, 2010] MacSim Tutorial (In ISCA-39, 2012)

  7. DRAM Scheduling in GPGPUs DRAM Bank DRAM Controller Qs for Core-0 Qs for Core-1 Potential of Requests from Core-0 = |W0|α + |W1|α + |W2|α+ |W3|α = 4α+ 3α+ 5α (α < 1) Reduction in potential if: row hit from queue of length L is serviced next Lα – (L – 1)α row hit from queue of length L is serviced next Lα – (L – 1/m)α m = cost of servicing row miss/cost of servicing row hit Tolerance(Core-0) < Tolerance(Core-1)  select Core-0 Servicing row hit from W1 (of Core-0) results in greatest reduction in potential, so service row hits from W1 next W0 W1 W2 W3 W0 W1 W2 W3 RH RM RM RM RM RH RM RM RM RH RM RM RM RH RH RM RM Core-0 Core-1 Tolerance(Core-0) < Tolerance(Core-1) • DRAM Scheduling Policy for GPGPU Architectures Based on a Potential Function [Lakshminarayana et al. IEEE CAL, 2011] MacSim Tutorial (In ISCA-39, 2012)

  8. Power Research & Validation • Verifying simulator and GTX580 • Modeling X86-CPU power • Modeling GPU power • Still on-going research MacSim Tutorial (In ISCA-39, 2012)

  9. MacSim’s Roadmap OpenGL Program ARM Architecture Mobile Platform Power/Energy Model 2012 ~ 2013 MacSim Tutorial (In ISCA-39, 2012)

More Related