No free lunch no hidden cost
This presentation is the property of its rightful owner.
Sponsored Links
1 / 25

No Free Lunch, No Hidden Cost PowerPoint PPT Presentation


  • 84 Views
  • Uploaded on
  • Presentation posted in: General

No Free Lunch, No Hidden Cost. X. Sharon Hu Dept. Computer Science and Engineering University of Notre Dame. How Can Co-Design Help?. The Salishan Conference on High-Speed Computing. 1. 1. Department of Computer Science and Engineering. Theme: Exposing Hidden Execution Costs.

Download Presentation

No Free Lunch, No Hidden Cost

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


No free lunch no hidden cost

No Free Lunch, No Hidden Cost

X. Sharon Hu

Dept. Computer Science and Engineering

University of Notre Dame

How Can Co-Design Help?

The Salishan Conference on High-Speed Computing

1

1

Department of Computer Science and Engineering


Theme exposing hidden execution costs

Theme: Exposing Hidden Execution Costs

  • Cost of execution: performance and power

    • Computation

    • Communication

    • Data motion

    • Synchronization

  • How can we strike a balance between the extremes?

    • Hide as much as possible?

    • Explicitly manage “all” costs?

  • My “position”:

    • Expose widely and choose wisely

    • Focus on power


Why taking the position

Why Taking the Position?

  • Expose widely

    • Better understanding the contribution by each component

    • Allowing application-specific tradeoffs

    • Providing opportunities for powerful co-design tools

  • Choose wisely

    • Requiring sophisticated co-design tools

    • Exploring more algorithm/software options


But easier said than done

But Easier Said Than Done!

  • Heterogeneity

    • Compute nodes: (multi-core) CPU, GP-GPU, FPGA, …

    • Memory components: on-chip, on-board, disks, …

    • Communication infrastructure: bus, NoC, networks, …

  • Parallelism (”non-determinism”)

    • Data access: movement, coherence, …

    • Resource contention

    • synchronization


Outline

Outline

  • Why expose widely?

  • How to benefit from exposing widely?

  • How to choose wisely?

  • Going forward


Why expose widely 1

Why Expose Widely? (1)

  • Different programs has different power distribution

GPU Power Distribution (NVidia GTX 280)

GPU Cores

ConstCache

Memory

ConstSM

TextCache

}

Hong and Kim, ISCA 2010


Why expose widely 2

Why Expose Widely? (2)

  • Data movement impacts different algorithms differently

Energy consumptions of three sorting algorithms (Pentium 4 + GeForce 570)


Why expose widely 3

Why Expose Widely? (3)

  • Application dependent

Performance degradation due to memory bus contention

Massaki Kondo, et. al., SigARCH 2007


Outline1

Outline

  • Why expose widely?

  • How to benefit from exposing widely?

  • How to choose wisely?

  • Going forward


How to benefit from exposing widely

How to Benefit from “Exposing Widely”?

  • Co-design is the key

  • Expose all factors impacting the “execution model”

    • Computation: processing resource

    • Data motion: memory components and hierarchy

    • Communication: bus and network

    • Resource contention, synchronization…

    • Some examples

      • Software macromodeling

      • Hardware module-based modeling

  • Optimize through power management

    • Keep in mind Amdahl’s law


Macromodeling algorithm complexity based

Macromodeling: Algorithm Complexity Based

  • Relate power/energy of a program with its complexity

    • Example: E = C1S + C2S2 + C3S3 (Tan, et. al. DAC’01) where S is the size of the array for a sorting algorithm

    • Example: Ecomm = C0 + C1S (Loghi, et. al. ACMTECS’07) where S is the size of exchanged messages

  • More sophisticated models to account for both computing and communication

  • How to handle resource contention?


Power modeling of bus contension

Power Modeling of Bus Contension

  • Penolazzi, Sander and Ahmed Hemani: DATE’11

  • Characterization step

    • C%N,1 : percentage of cycle difference between the N-processor case and 1-processor case

    • Can be one by IP providers on chosen benchmarks

  • Prediction step


Hierarchical module based power modeling

Hierarchical Module-Based Power Modeling

  • Accumulate energy/power of modules

  • CPU+GPU example

  • Access rate: software dependent

  • Data movement contributes to memory power

  • Resource contention modifies access rate

Adapted from Isci and Martonosi, Micro’03


Outline2

Outline

  • Why expose widely?

  • How to benefit from exposing widely?

  • How to choose wisely?

  • Going forward


Managing bus contention to reduce energy

Managing Bus Contention to Reduce Energy

  • M. Kondo, H. Sasaki and H. Nakamura, 2006

  • Counter for mem request

  • Register for PU identification

  • Thresholds for selecting which PU uses what Vdd value


Application mapping to reduce energy 1

Application Mapping to Reduce Energy (1)

  • Application mapping for heterogeneous systems

([minR2,maxR2], D2)

([minR1,maxR1], D1)

PE 2

PE 1

J1

J2

PE 4

PE 3

J3

J4

([minR3,maxR3], D3)

([minR4,maxR4], D4)

Memory

R. Racu, R. Ernst, A. Hamann, B. Mochocki and X. Hu, “Methods for power optimization in distributed embedded systems with real-time requirements,”, CASES’06.


Application mapping to reduce energy 2

Application Mapping to Reduce Energy (2)

  • Optimization:

    • Minimize power/energy dissipation

    • Satisfying timing properties (e.g. average path latency, average lateness, etc.)

  • Search Space:

    • Scheduling parameter, traffic shaping, …

    • Task level DVFS, i.e. task speed assignment

    • Resource level DVFS, i.e., resource speed assignment


Application mapping 3 sensitivity analysis

Application Mapping (3): Sensitivity Analysis

R. Racu, R. Ernst, A. Hamann, B. Mochocki and X. Hu, “Methods for power optimization in distributed embedded systems with real-time requirements,”, CASES’06.


Application mapping 4 ga based approach

Application Mapping (4): GA-Based Approach

2’. Scheduling Trace

3’. Power Dissipation

Power

Analyzer

Power model needed


A sample result

A Sample Result


Outline3

Outline

  • Why expose widely?

  • How to benefit from exposing widely?

  • How to choose wisely?

  • Going forward


Going forward systematic co design effort

Going Forward: Systematic Co-design Effort

  • Expose more

    • More hardware counters / registers

    • More efficient/accurate high-level power models

    • Better models for resource contention and synchronization

  • Choose better

    • Handling parallelism

      • Algorithm, OS, hardware

      • Resource contention

      • synchronization

    • Handling non-determinism

      • Worst case bounds

      • Statistical analysis

      • Interval-based techniques


Es design v s hpcs design

ES Design v.s. HPCS Design

  • Differences (maybe)

    • Application specific workloads v.s. domain specific workloads

    • Constraints, objectives, desirables?

      • latency, throughput, energy, cost, reliability, fault tolerance, IP protection/privacy, ToM, …

    • Other issues: homogeneous v.s. heterogeneous, levels of complexity, user expertise,…

  • Similarities

    • Ever increasing hardware capability: multi-core, multi-thread, complex communication fabrics, memory hierarchy, …

    • Productivity gap

    • Common concerns: latency, throughput, energy, cost, reliability, fault tolerance, …


Leverage co design for hpc

Leverage Co-Design for HPC

  • Systematic performance estimation

    • Formal methods: scenario-based, statistical analysis

    • Hybrid approaches: analytical+simulation

    • Seamless migration from one abstraction level to the next

  • Efficient design space exploration

    • Efficient search techniques

    • Multiple-level abstraction models

    • Multiple-attribute optimization

    • Others: memory and communication analysis and design


No free lunch no hidden cost

Thank you!


  • Login