Learning from the stanford doe visualization cluster
Download
1 / 27

Workshop on Commodity-Based Visualization Clusters - PowerPoint PPT Presentation


  • 301 Views
  • Updated On :

Learning From the Stanford/DOE Visualization Cluster. Mike Houston, Greg Humphreys, Randall Frank, Pat Hanrahan. Outline. Stanford’s current cluster Design decisions Performance evaluation Bottleneck evaluation Cluster “Landscape” General classification Bottleneck evaluation

Related searches for Workshop on Commodity-Based Visualization Clusters

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Workshop on Commodity-Based Visualization Clusters' - johana


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Learning from the stanford doe visualization cluster

Learning From the Stanford/DOE Visualization Cluster

Mike Houston, Greg Humphreys, Randall Frank, Pat Hanrahan


Outline
Outline

  • Stanford’s current cluster

    • Design decisions

    • Performance evaluation

    • Bottleneck evaluation

  • Cluster “Landscape”

    • General classification

    • Bottleneck evaluation

  • Stanford’s next cluster

    • Design goals

    • Research directions



Cluster configuration jan 2000
Cluster Configuration (Jan. 2000)

  • Cluster: 32 graphics nodes + 4 server nodes

  • Computer: Compaq SP750

    • 2 processors (800 MHz PIII Xeon, 133MHz FSB)

    • i840 core logic (big issue for vis-clusters)

      • Simultaneous fast graphics and networking

      • Network: 64-bit, 66 MHz PCI

      • Graphics: AGP-4x

    • 256 MB memory

    • 18GB SCSI 160 disk (+ 3*36GB on servers)

  • Graphics (Sept. 2002)

    • 16 NVIDIA GeForce3 w/ DVI (64 MB)

    • 16 NVIDIA GeForce4 TI4200 w/ DVI (128 MB)

  • Network

    • Myrinet 64-bit, 66 MHz (LANai 7)


Graphics evaluation
Graphics Evaluation

  • NVIDIA GeForce3

    • 25 MTri/s triangle rate observed

    • 680 MPix/s fill rate observed

  • NVIDIA GeForce4

    • 60 MTri/s triangle rate observed

    • 800 MPix/s fill rate observed

  • Read Pixels performance

    • 35 MPix/s (140 MB/s) RGBA

    • 22 MPix/s (87 MB/s) Depth

  • Draw Pixels performance

    • 45 MPix/s (180 MB/s) RGBA

    • 21 MPix/s (85 MB/s) Depth


Network evaluation
Network Evaluation

  • Myrinet LANai 7 PCI64A boards

    • Theoretical Limit: 160 MB/s

    • 142 MB/s observed peak under Linux

    • ~100 MB/s observed sustained under Linux

  • ServerNet not chosen

    • Driver support

    • Large switching infrastructure required

  • Gigabit Ethernet

    • Performance and scalability concerns


Myrinet issues
Myrinet Issues

  • Fairness: Clients starved of network resources

    • Implemented credit scheme to minimize congestion

  • Lack of buffering in switching fabric

    • Causes poor performance in high load conditions

    • Open issue

Partitioned Cluster

Unpartitioned Cluster


I840 chipset evaluation
i840 Chipset Evaluation

  • 66MHz 64bit PCI performance not full speed:

    • 210 MB/s PCI read (40% of theoretical peak)

    • 288 MB/s PCI write (54% of theoretical peak)

    • Combined read/write ~121 MB/s

  • AGP

    • Fast Writes / Side Band Addressing unstable under Linux


Sort first performance
Sort-First Performance

  • Configuration

    • Application runs application on client

    • Primitives distributed to servers

  • Tiled Display

    • 4x3 @ 1024x768

    • Total resolution: 4096x2304,

      9 Megapixel

  • Quake 3

    • 50 fps

  • Atlantis

    • 450 fps


Sort last performance
Sort-Last Performance

  • Configuration

    • Parallel rendering on multiple nodes

    • Composite to final display node

  • Volume Rendering on 16 nodes

    • 1.57 GVox/s [Humphreys 02]

    • 1.82 GVox/s (tuned) 9/02

    • 256x256x1024 volume1

      rendered twice

1Data Courtesy of G. A Johnson, G.P.Cofer, S.L Gewalt, and L.W. Hedlund from the Duke Center for In Vivo Microscopy (an NIH/NCRR National Resource)


Cluster accomplishments
Cluster Accomplishments

  • Development Platform

    • WireGL

    • Chromium

  • Cluster configuration replicated

  • Interactive Performance

    • 256x512x1024 volume @ 15fps

    • 9 Megapixel Quake3 @ 50fps


Sources of bottlenecks
Sources of Bottlenecks

  • Sort-First

    • Packing speed (processor)

    • Primitive distribution (network and bus)

    • Rendering (processor and graphics chip)

  • Sort-Last

    • Rendering (graphics chip)

    • Composite (network, bus, and read/draw pixels)


Bottleneck evaluation stanford
Bottleneck Evaluation – Stanford

  • Sort-First: Processor and Network

  • Sort-Last: Network and Read/Draw


The landscape of graphics clusters
The Landscape of Graphics Clusters

  • Many Options

    • Low End <$2500/node

    • Mid End ~$5000/node

    • High End >$7500/node

  • Tradeoffs

    • Different bottlenecks

    • Price/Performance

    • Scalability

    • Usage

  • Evaluation

    • Based off of published benchmarks and specs


Cluster interconnect options
Cluster Interconnect Options

  • Many choices

    • GigE

      • ~100 MB/s

    • Myrinet 2000 (http://www.myrinet.com)

      • 245MB/s

    • SCI/Dolphin (http://www.dolphinics.com)

      • 326 MB/s

    • Quadrics (http://www.quadrics.com)

      • 340 MB/s

  • Future options

    • 10 GigE

    • Infiniband

    • HyperTransport


Low end
Low End

  • General Definition

    • Single CPU

    • Consumer Mainboard

    • Integrated Graphics

    • High Speed commodity network

  • Example Node Configuration

    • Nvidia NForce2

    • AMD Athlon 2400+

    • 512 MB DDR

    • GigE and 10/100

    • 1U rack chassis

    • Estimated Price: $1500



Mid end
Mid End

  • General Definition

    • Dual Processor

    • “Workstation” mainboard

    • High performance bus

      • 64-bit PCI or PCI-X

    • High Speed Commodity / Low end cluster interconnect

    • High-End consumer graphics board

  • Example Node Configuration

    • Intel i860

    • Dual Intel P4 Xeon 2.4GHz

    • 2GB RDRAM

    • ATI Radeon 9700

    • GigE onboard + Myrinet 2000

    • 2U rack chassis

    • Estimated Price: $4000


Bottleneck evaluation mid end
Bottleneck Evaluation – Mid End

  • Sort-First: Network limited

  • Sort-Last: Read/Draw and Network limited


High end
High End

  • General Definition

    • Dual or Quad processor

    • Cutting edge bus

      • PCI-X, HyperTransport, PCI Enhanced

    • High Speed Commodity/ High end cluster interconnect

    • “Professional” graphics board

    • RAID system

  • Example Node Configuration

    • ServerWorks GC-WS

    • Dual P4 Xeon 2.6GHz

    • Nvidia Quadro4 900XGL

    • 4GB DDR

    • GigE onboard + Infiniband

    • Estimated Price: $7500


Bottleneck evaluation high end
Bottleneck Evaluation – High End

  • Sort-First: Well balanced

  • Sort-Last: Read/Draw limited


Balanced system is key
Balanced System is Key

  • Only as fast as slowest component

    • Spend money where it matters!


Goals for next cluster
Goals for Next Cluster

  • Performance

    • Sort-Last

      • 5 GVox/s

      • 1 GTri/s

    • Sort-First at 4096x2304

      • Quake3 @ >100fps

  • Research

    • Remote visualization

    • Time-varying datasets

    • Compositing


What we plan to build
What we plan to build

  • 16 Node cluster, 1U nodes

  • Mainboard chipsets

    • Intel Placer

    • ServerWorks GC-WS

    • AMD Hammer

  • Memory

    • 2-4GB

  • Graphics Chip

    • Nvidia NV30

    • ATI R300/350

  • Interconnect

    • Infiniband, Quadrics

  • Disk

    • IDE RAID or SCSI


Continuing chipset issues
Continuing Chipset Issues

  • Why do chipsets perform so poorly?

    • “Workstation”

      • Intel i860

        • 215 MB/s read (40% of theoretical)

        • 300 MB/s write (56% of theoretical)

      • AMD 760MPX

        • 300 MB/s read (56% of theoretical)

        • 312 MB/s write (59% of theoretical)

    • “Server”

      • ServerWorks ServerSet III LE

        • 423 MB/s read (79% of theoretical)

        • 486 MB/s write (91% of theoretical)

  • Why can’t a “server” have an AGP slot?

Performance numbers from http://www.conservativecomputer.com


Ongoing bottlenecks
Ongoing Bottlenecks

  • Readback performance

    • Will be fixed “soon”

    • Hardware compositing?

  • Chipset Performance

    • Achieve fraction of theoretical

    • Need faster busses in commodity chipsets

  • Network Performance

    • Scalability

    • Fast is VERY expensive


Conclusions
Conclusions

  • What we still need

    • More vendors

    • More chipsets

    • More performance

  • Graphics Clusters are getting better

    • Chipsets

    • Interconnects

    • Form factor

    • Processing

    • Graphics Chips

  • Things are really starting to get interesting!


ad