noc symposium 07 panel proliferating the use and acceptance of noc benchmark standards n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
NoC Symposium’07 Panel Proliferating the Use and Acceptance of NoC Benchmark Standards PowerPoint Presentation
Download Presentation
NoC Symposium’07 Panel Proliferating the Use and Acceptance of NoC Benchmark Standards

Loading in 2 Seconds...

play fullscreen
1 / 9

NoC Symposium’07 Panel Proliferating the Use and Acceptance of NoC Benchmark Standards - PowerPoint PPT Presentation


  • 124 Views
  • Uploaded on

NoC Symposium’07 Panel Proliferating the Use and Acceptance of NoC Benchmark Standards. Timothy M. Pinkston National Science Foundation (NSF) tpinksto@nsf.gov University of Southern California (USC) tpink@usc.edu. Driving Forces. Demand for System Functions.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'NoC Symposium’07 Panel Proliferating the Use and Acceptance of NoC Benchmark Standards' - ozzie


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
noc symposium 07 panel proliferating the use and acceptance of noc benchmark standards

NoC Symposium’07 PanelProliferating the Use and Acceptance of NoC Benchmark Standards

Timothy M. Pinkston

National Science Foundation (NSF)

tpinksto@nsf.gov

University of Southern California (USC)

tpink@usc.edu

slide2

Driving Forces

Demand for

System Functions

Performance of System Functions

Hardware

Functional Blocks

Demand for

Functional Blocks

workloads

Arch

App’s

Architecture

Tech

defines how system functions are supported

Alg &

SW

Implementation (Circuit) Technology

Applications

define what system functions should be supported

defines the extent to which desired system functions can be implemented in hardware

“Trends Towards On-chip Networked Microsystems”, T. Pinkston and J. Shin, IJHPCN. (http://ceng.usc.edu/smart/publications/archives/CENG-2004-17.pdf)

need for a noc benchmark suite
Need for a NoC Benchmark Suite

Is There a ?

  • A sampling of benchmark suites already out there:
  • Gen-Purpose/PC
  • Embedded/SoC
  • Sci-Eng/HPC
  • SPEC CPU
  • -2006
  • STREAM
  • HPL
  • CPU2
  • EEMBC
  • SPLASH
  • -2
  • Netperf
  • MiBench
  • LINPACK
  • Dhry-/Whetstone
  • MediaBench
  • LAPACK
  • BAPCo SYSmark
  • ALPBench
  • ScaLAPACK
  • BYTEmark
  • GraalBench
  • NPB (NAS PB)
  • LMBench
  • NPCryptBench
  • LFK (Livermore)
  • LLCbench
  • CommBench
  • SparseBench
  • DMABench
  • BioBench
  • Do we really need yet another benchmark suite?
december 2006 nsf ocin workshop recommendations www ece ucdavis edu ocin06
December 2006 NSF OCIN Workshop Recommendations(www.ece.ucdavis.edu/~ocin06)
  • A set of standard workloads/benchmarks and evaluation methods are needed to enable realistic evaluation and uniform (fair) comparison between various approaches
  • Need for cooperation (agreement) between academia and industry
  • Need for “qualified” performance metrics: latency and bandwidth under power, energy, thermal, reliability, area, etc., constraints
  • Need for standardization of metrics: clear definition of what is being represented by metrics (e.g., network latency, throughput,...)
  • Need for effective alternatives to time consuming full-system execution-driven simulation, including use of microbenchmarks, parameterized synthetic traffic/workloads, traces, etc.
  • Need for accurate characterization and modelling of system traffic behavior across various domains: general-purpose & embedded
  • Need for analytical methods (complementary to simulation) to explore and quantitatively narrow-down the large design space

“Challenges in Computer Architecture Evaluation,” K. Skadron, M. Martonosi, D. August, M. Hill, D. Lilja, V. Pai, in IEEE Computer, pp. 30-36, August 2003.

meaning of latency and throughput
Meaning of Latency and Throughput
  • Latency: fabric only, endnode-to-endnode, ave., no-load, saturation?
  • Throughput: peak, sustained, saturation, best-case, worst-case?

Simulation: 3-D Torus, 4,096 nodes (16 х 16 х 16), uniform traffic load, virtual cut-through switching, three-phase arbitration, 2 and 4 virtual channels. Bubble flow control is used in dimension order on one virtual channel; the other virtual channel(s) is supplied in dimension order (deterministic routing) or along any shortest path to destination (adaptive routing).

simple analytical latency and throughput models

(cut-through switching)

lower bound (contention delay not included)

upper bound (contention delay not fully included)

r×BWBisection

Effective bandwidth = min(N × BWLinkInjection , , s× N × BWLinkReception)

g

Packet + (d x Header)

Latency = Sending latency + TLinkProp x (d+1) + (Tr + Ta + Ts)x d + + Receiving latency

Bandwidth

BWNetwork

Simple (Analytical) Latency and Throughput Models
  • H&P Int.Net. chapter: ceng.usc.edu/smart/slides/appendixE.html
    • Network traffic pattern/load determine s & g, traffic-dependent parameters
    • Topology and switch marchitecture determine d, Tr , Ta , Ts , BWBisection
    • Routing, switching, FC, march, etc., influence network efficiency factor, r
      • internal switch speedup & reduction of contention within switches
      • buffer organizations to mitigate HOL blocking in and across switches
      • balance load across network links & maximally utilize link bandwidth
      • r = rL x rR x rA x rS x rmArch x …, architecture-dependent parameters
modeling throughput of cell be eib worst case

ρ=38%

Modeling Throughput of Cell BE EIB (Worst-Case)

BWNetwork = ρ × BWBisection/g

BWNetwork = ρ × 204.8/1 GB/s

= 78 GB/s(measured)

g= 1

Injection

bandwidth:

25.6 GB/s

per element

Reception

bandwidth:

25.6 GB/s

per element

s= 1

Command Bus

Bandwidth

BWBisection = 8 links

= 204.8 GB/s

204.8 GB/s

Aggregate

bandwidth

Network

injection

Network

reception

Peak BWNetwork of 25.6 GB/s x 3 x 4

307.2 GB/s

(4 rings each with 12 links)

(12 Nodes)

(12 Nodes)

1,228.8 GB/s

(3 transfers per ring)

307.2 GB/s

307.2 GB/s

rlimited, at best, to only 50% due to

ring interferrence

Traffic pattern: determines s & g

slide8

Integer Programs

Floating-Point Programs

Ref: Hennessy & Patterson,

“Computer Architecture: A

Quantitative Approach, 4th Ed.

in conclusion answers to panel questions
In Conclusion: Answers to Panel Questions
  • What are the hallmarks of successful benchmark suites?
  • Fairness: represent the proper workload behavior/characteristics
  • Portability: open, free access, not architecture/vendor-specific
  • Transparency: yield reproducible performance results (reporting)
  • Evolutionary: adaptable over time in composition and reporting
  • How can industry and academia facilitate use?
  • Establish need/importance for common evaluation “best-practices”
  • Cross-cutting effort: architects, circuit designers, CAD researchers
  • Need to place high value on developing and using eval. standards
  • What are the main obstacles to establishing a de facto NoC standard benchmark suite, and how to address?
  • Capturing the diversity of NoC applications & computing domains
  • Red herrings  converge on performance evaluation standards and agree on characteristic traffic loads and/or microbenchmarks
  • Ultimately, system-level performance is important, not component