Dell research in hpc
Download
1 / 22

Dell Research In HPC - PowerPoint PPT Presentation


  • 153 Views
  • Updated On :

Dell Research In HPC. GridPP7 1 st July 2003. Steve Smith HPC Business Manager Dell EMEA [email protected] Traditional . Traditional . HPC Architecture. HPC Architecture. The Changing Models of. Proprietary. Proprietary. High Performance Computing. RISC. RISC. Vector.

Related searches for Dell Research In HPC

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Dell Research In HPC' - jena


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Dell research in hpc l.jpg

Dell Research In HPC

GridPP7

1st July 2003

Steve Smith

HPC Business Manager

Dell EMEA

[email protected]


Slide2 l.jpg

Traditional

Traditional

HPC Architecture

HPC Architecture

The Changing Models of

Proprietary

Proprietary

High Performance Computing

RISC

RISC

Vector

Vector

Custom

Custom

Current

Current

HPC Architecture

HPC Architecture

Future HPC

Future HPC

Applications

Applications

Architecture

Architecture

OS

OS

Shared

Shared

Middleware

Cluster, SMP, Blades

Cluster, SMP, Blades

Middleware

Resources

Resources

Hardware

Hardware

Standard Based

Standard Based

Grid

Clusters

Clusters

SMP

SMP

Distributed

Distributed

Rich Client

Rich Client

Applications

Applications

© Copyright 2002-2003 Intel Corporation


Hpcc building blocks l.jpg
HPCC Building Blocks

Parallel Benchmarks (NAS, HINT, Linpack…)

and Parallel Applications

Benchmark

MPI/Pro

MPICH

MVICH

PVM

Middleware

Windows

Linux

OS

Elan

TCP

VIA

GM

Protocol

Fast Ethernet

Gigabit Ethernet

Myrinet

Quadrics

Interconnect

PowerEdge & Precision (IA32 & IA64)

Platform


Hpcc components and research topics l.jpg
HPCC Components and Research Topics

- Custom application benchmarks

- Standard benchmarks

- Performance studies

Vertical Solutions: application Prototyping / Sizing

- Energy/Petroleum - Life Science

- Automotives – Manufacturing and Design

- Reliable PVFS

- GFS , GPFS …

- Storage Cluster Solutions

Resource Monitoring / Management

Resource dynamic allocation

Checkpoint restarting and

Job redistributing

Cluster monitoring

Load analysis and

Balancing

  • Remote access

  • Web-based GUI

Application

Benchmark

Cluster

File System

Compilers and math library

Performance tools

- MPI analyzer / profiler

- Debugger

- Performance analyzer and optimizer

Job Scheduler

Node

Monitoring & Management

Development Tools

Cluster monitoring

Distributed System Performance Monitoring

Workload analysis and

Balancing

  • Remote access

  • Web-based GUI

Middleware / API

MPI 2.0 / Fault Tolerant MPI

MPICH, MPICH-GM, MPI/LAM, PVM

Operating Systems

Cluster

Hardware

Software

Monitoring &

Management

Interconnect Technologies

- FE, GbE, 10GE… (RDMA)

- Myrinet, Quadrics, Scali

- Infiniband

Interconnect Protocols

Interconnects Hardware

Remote installation / configuration

PXE support

System Imager

LinuxBIOS

Cluster

Installation

Management Hardware

IA-32, IA64 (Processor / Platform) comparison

Standard rack mounted, blade and brick servers / workstations

Platform Hardware



Hpcc technology roadmap l.jpg
HPCC Technology Roadmap

TOP500 (June 2004)

TOP500 (Nov 2002)

TOP500 (June 2003)

TOP500 (Nov 2003)

Grid Engine (GE)

Condor-G

Platform Computing

Lustre File System 2.0

Global File System

Lustre File System 1.0

Qluster

MPICH-G2

Cycle Stealing

ADIC

PVFS2

NFS

Myrinet hybrid switch

Globus Toolkit

10GbE

Scali

Quadrics

Ganglia

Myrinet 2000

IB Prototyping

Clumon (NCSA)

iSCSI

Financial: MATLAB

Manufacturing: Fluent, LS-DYNA, Nastran

Life Science: BLASTs

Energy: Eclipse, LandMark VIP

Vertical Solutions

Data Grid

Grid Computing

Cluster Monitoring

File Systems

Middleware

Interconnects

Platform Baselining

Yukon 2P 2U

Everglades 2P 1U

Big Bend 2P 1U

Q3 FY03

Q4 FY03

Q1 FY04

Q2 FY04

Q3 FY04

Q4 FY04

Q1 FY05

Q2 FY05

Q3 FY05


In the box scalability of xeon servers l.jpg
In the box scalability of Xeon Servers

71 %Scalability in the box


In the box xeon 533 mhz fsb scaling l.jpg
In the BOX – XEON (533 MHz FSB) Scaling

http://www.cs.utexas.edu/users/flame/goto/

32 %Performance improvement


Goto comparison on myrinet l.jpg
Goto Comparison on Myrinet

37%Improvement with Goto’s library


Goto comparison on gigabit ethernet l.jpg
Goto Comparison on Gigabit Ethernet

64 Nodes / 128 Processors

25%Improvement with Goto’s library


Process to processor mapping l.jpg
Process-to-Processor Mapping

Node 1

Node 2

CPU1

CPU2

CPU2

Switch

CPU1

Process 4

Process 1

Process 3

Process 2

Round Robin (Default)

Node 1

Node 2

CPU1

CPU2

CPU2

Switch

CPU1

Process 1

Process 2

Process 3

Process 4

Process Mapped




Hpl results on the xeon cluster l.jpg
HPL Results on the XEON Cluster

Fast Ethernet Gigabit Ethernet Myrinet

Balanced system

designed for HPL type

of applications

Size-major is

7% better than

Round Robin

Size-major is

35% better than

Round Robin



Reservoir simulation process mapping gigabit ethernet l.jpg
Reservoir Simulation – Process Mapping – Gigabit Ethernet

11% improvement with GigE


How hyper threading technology works l.jpg
How Hyper-Threading Technology Works

First Thread/Task

Second Thread/Task

Execution

Resource

Utilization

Time

Both Threads/Tasks without Hyper-Threading Technology

Time saved,

up to 30%

Both Threads/Tasks with Hyper-Threading Technology

Greater resource utilization equals greater performance

© Copyright 2002-2003 Intel Corporation


Hpl performance comparison l.jpg
HPL Performance Comparison

on a 16-node Dual-Xeon 2.4 GHz cluster

Linpack Performance Results

Hyper-threading provides ~6% Improvement on a 16 node 32 processors cluster

90

80

70

60

50

GFLOPS

40

16x4 processes with HT on

30

16x2 processes without HT

20

16x2 processes with HT on

10

16x1 processes without HT

0

2000

4000

6000

10000

14000

20000

28000

40000

48000

56000

Problem size


Npb ft fast fourier transformation l.jpg
NPB-FT (Fast Fourier Transformation)

9000

8000

without HT

with HT

7000

6000

Cache (L2) misses increased

  • Without HT: 68%

  • With HT: 76%

5000

Mop/sec

4000

3000

2000

1000

0

1x2 (1x4 with HT)

2x2 (2x4 with HT)

4x2 (4x4 with HT)

8x2 (8x4 with HT)

16x2 (16x4 with HT)

32x2 (32x4 with HT)

Number of nodes X Number of processors

Configuration


Npb ep embarrassingly parallel l.jpg
NPB-EP (Embarrassingly Parallel)

without HT

with HT

1000

900

800

700

  • EP requires almost no communication

  • SSE and x87 utilization increased

    • Without HT: 94%

    • With HT: 99%

600

Mop/sec

500

400

300

200

100

0

1x2 (1x4 with HT)

2x2 (2x4 with HT)

4x2 (4x4 with HT)

8x2 (8x4 with HT)

16x2 (16x4 with HT)

32x2 (32x4 with HT)

Configuration

Number of nodes X Number of processors


Observations l.jpg
Observations

  • Computational intensive applications with fine-tuned floating-point operations have less chance to be improved in performance from Hyper-Threading, because the CPU resources could already be highly utilized

  • Cache-friendly applications might suffer from Hyper-Threading enabled, because processes running on logical processors might be competing for the shared cache access, which might result in performance degradation

  • Communication-bound or I/O-bound parallel applications may benefit from Hyper-Threading, if the communication and computation can be performed in an interleaving fashion between processes.

  • The current version of Linux OS’s support on Hyper-Threading is limited, which could cause performance degradation significantly if Hyper-Threading is not applied properly.

  • To the OS, the logical CPUs are almost undistinguishable from physical CPUs

  • The current Linux scheduler treats each logical CPU as a separate physical CPU - which does not maximize multiprocessing performance

  • A patch for better HT support is available (Source: "fully HT-aware scheduler" support – 2.5.31-BK-curr , by Ingo Molnar)


Thank you l.jpg

Thank You

Steve Smith

HPC Business Manager

Dell EMEA

[email protected]


ad