high performance computing with microsoft compute cluster solution l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
High Performance Computing With Microsoft Compute Cluster Solution PowerPoint Presentation
Download Presentation
High Performance Computing With Microsoft Compute Cluster Solution

Loading in 2 Seconds...

play fullscreen
1 / 34

High Performance Computing With Microsoft Compute Cluster Solution - PowerPoint PPT Presentation


  • 161 Views
  • Uploaded on

High Performance Computing With Microsoft Compute Cluster Solution . Kyril Faenov (kyrilf@microsoft.com) DAT301 Director of High Performance Computing Microsoft Corporation. High Performance Computing.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'High Performance Computing With Microsoft Compute Cluster Solution' - jena


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
high performance computing with microsoft compute cluster solution

High Performance Computing With Microsoft Compute Cluster Solution

Kyril Faenov (kyrilf@microsoft.com)

DAT301

Director of High Performance Computing

Microsoft Corporation

high performance computing
High Performance Computing
  • Cutting edge problems in science, engineering and business always require capabilities beyond those of standalone computers
  • Market pressures demand accelerated innovation cycle, overall cost reduction and thorough outcome modeling
    • Aircraft design utilizing composite materials
    • Vehicle fuel efficiency and safety improvements
    • Simulations of enzyme catalysis, protein folding
    • Targeted material and drug design
    • Simulation of nanoscale electronic devices
    • Financial portfolio risk modeling
    • Digital content creation and enhancement
    • Supply chain modeling and optimization
    • Long term climate projections

Volume economics of of industry standard hardware and commercial software applications are rapidly bringing HPC capabilities to broader number of users

evolution of hpc applications and systems

IT Mgr

Manual, batchexecution

Interactive Computation and Visualization

SQL

Evolution Of HPC Applications And Systems
microsoft compute cluster solution
Microsoft Compute Cluster Solution

Head Node

Active Directory

Job Mgmt

Cluster Mgmt

Scheduling

Resource Mgmt

Desktop App

Job

Policy, reports

Admin Console

User

Admin

User Console

Management

Cmd line

Input

Cmd line

Job

Data

DB/FS

Node Manager

High speed, low latency interconnect

Job Execution

User App

MPI

ccs product summary
CCS Product Summary
  • What is it?
    • A solution that allows compute intensive applications to easily and cost effectively scale their performance using compute clusters
  • Core Platform
    • Based on Windows Server 2003 SP1 64 bit Edition
    • Ethernet, Infiniband and other interconnect support leveraging Winsock Direct
  • Administration
    • Prescriptive, simplified cluster setup and administration
    • Scripted, image-based compute node management
    • Active Directory based security
    • Scalable, extensible job scheduling and resource management
  • Development
    • Cluster scheduler programmable via .NET and DCOM
    • MPI2 stack with performance and security enhancements for parallel applications
    • Visual Studio 2005 – OpenMP, Parallel Debugger
models of parallel programming
Models Of Parallel Programming
  • Data Parallel
    • Shared memory (load, store, lock, unlock... )
      • Already present in Windows today!
    • Message Passing (send, receive, broadcast... )
      • MPI support in Computer Cluster Solution
    • Directive-based (compiler needs help... )
      • OpenMP support in Visual Studio 2005
    • Transparent (compiler works magic... )
      • Holy grail
  • Task parallel
    • Data-flow and Vector
job task conceptual model

Parallel MPI Job

Serial Job

Parameter Sweep Job

Task

Task

Task

Task

Task

Proc

Proc

IPC

Proc

Proc

Proc

Proc

Task Flow Job

Task

Task

Task

Task

Job/Task Conceptual Model
about mpi
About MPI
  • Early HPC systems (Intel’s NX, IBM’s EUI, etc) were not portable
  • The MPI Forum organized in 1992 with broad participation by
    • vendors: IBM, Intel, TMC, SGI, Convex, Meiko
    • portability library writers: PVM, p4
    • users: application scientists and library writers
  • MPI is a standard specification, there are many implementations
    • MPICH and MPICH2 reference implementations from Argonne
    • MS-MPI based on (and compatible with) MPICH2
    • Other implementations include LAM-MPI, OpenMPI, MPI-Pro, WMPI
  • Why did MS HPC team choose MPI?
    • MPI has emerged as de-facto standard for parallel programming
  • MPI consists of 3 parts
    • Full-featured API of 160+ functions
    • Secure process launch and communication runtime
    • Command-line (mpiexec) to launch jobs
ms mpi leverages winsock direct
MS-MPI Leverages Winsock Direct

User Mode

HPC Application

MPI

Switch trafficbased on sub-net

WinSock DLL

Winsock Switch

IBw/ RDMA

GigE

w/ RDMA

IB WinSock Provider DLL

Ethernet

GigE RDMA WinSock Provider DLL

User API (verbs based)

Manage hardware resources in user space (eg., Send and receive queues)

User Host Channel Adapter Driver

TCP

IP

Kernel Mode

NDIS

Miniport (GigE)

Miniport (IPoIB)

Kernel API (verbs based)

OS component

Virtual Bus Driver

Host Channel Adapter Driver

IHV-provided component

Networking Hardware

fundamental mpi features
Programming with MPI

Communicators

Groups of nodes used for communications

MPI_COMM_WORLD is your friend

Rank (a node’s ID)

Target communications

Segregate work

Collective Operations

Collect and reduce data in a single call

sum, min, max, and/or, etc

Fine control of comms and buffers if you like

MPI and derived data types

Launching Jobs

MPIexec arguments

# of processors required

Names of specific compute nodes to use

Launch and working directories

Environment variables to set for this job

Global values (for all compute nodes- not just the launch node)

Point to files of command line arguments

env MPICH_NETMASK to control network used for this MPI job

Fundamental MPI Features
parallel execution visualization
Each line represents 1000’s of messagesParallel Execution Visualization

1000 x

Detailed view shows opportunities for optimization

job scheduler stack
Job Scheduler Stack

Jobs/Tasks

Client Node

Admission

Head Node

Allocation

Activation

Compute Node

end to end security
End-To-End Security

Kerberos

Scheduler

Node Mgr

Client

Secure channel

Secure channel

credential

credential

Logon as user

Data Protection API

ActiveDirectory

credential

MSDE

Spawn

Logon token

Task

Data

DB/FS

LSA

Automatic Ticket renewal

Kerberos

community resources
Community Resources

At PDC, go see

    • FUN302: Programming with Concurrency – Multithreading Best Practices (9/13 2:45pm)
    • FUN405: Programming with Concurrency – Multithreading on Windows (9/13 4:15pm)
    • FUN323: Microsoft Research – Future Possibilities in Concurrency (9/16 8:00am)
    • Bob Muglia’s Windows Server keynote (9/15 8:30am)
    • Product Pavilion – meet HPC team members
    • Visit the Hands On Lab to try the demos yourself!

To Learn More

  • Microsoft Compute Cluster Solution Beta 1 – Released Today!
    • http://connect.microsoft.com/availableprograms.aspx
  • Microsoft HPC website
    • http://www.microsoft.com/hpc/
  • Public newsgroup
    • nntp://microsoft.public.windows.hpc/
  • MPICH home and documentation
    • http://www-unix.mcs.anl.gov/mpi/mpich/
slide22

© 2005 Microsoft Corporation. All rights reserved.

This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.

welcome to the multi core era

log transistors/die

log CPU clock freq

5 B<10 GHz

>30%/y

!

100 M

3 GHz

<10%/y

10,000

1 MHz

2015

2003

1975

Welcome to the Multi-Core Era
  • How can programmers benefit from concurrency today?
  • How is concurrency supported and used in the Microsoft platform?
  • What techniques is Microsoft Research investigating for programming future highly-parallel hardware?
example calculate pi25
Example: Calculate pi

#include "mpi.h"

#include <math.h>

int main(int argc, char *argv[])

{int done = 0, n, myid, numprocs, i, rc;double PI25DT = 3.141592653589793238462643;double mypi, pi, h, sum, x, a;MPI_Init(&argc,&argv);MPI_Comm_size(MPI_COMM_WORLD,&numprocs);MPI_Comm_rank(MPI_COMM_WORLD,&myid);while (!done) { if (myid == 0) { printf("Enter the number of intervals:

(0 quits) "); scanf("%d",&n); } MPI_Bcast(&n, 1, MPI_INT, 0, MPI_COMM_WORLD); if (n == 0) break;

Initialize

“Correct” pi

Start MPI

Get # procs assigned to this job

Get proc # of this proc

On proc 0, ask user for number of intervals

Compute

Send # of intervals to all procs

example calculate pi 2
Example: Calculate pi (2)

h = 1.0 / (double) n; sum = 0.0; for (i = myid + 1; i <= n; i += numprocs) { x = h * ((double)i - 0.5); sum += 4.0 / (1.0 + x*x); } mypi = h * sum; MPI_Reduce(&mypi, &pi, 1, MPI_DOUBLE,

MPI_SUM, 0, MPI_COMM_WORLD); if (myid == 0) printf("pi is approximately %.16f,

Error is %.16f\n", pi,

fabs(pi - PI25DT));}MPI_Finalize();

return 0;

}

Sum this proc’s share of the intervals

Sum all proc intervals

Report

On proc 0, print the estimated value of pi and the deviation from the “correct” value

job submission serial job
Job Submission: serial job

C:\> job submit /stdout:\\hn\users\foo\^%CCP_JOBID^%.txt myapp.exe

Job submitted, ID:4

Escape Character

job submit options:

[/scheduler:host]

[/jobname:JobName]

[/numprocessors:min[-max]]

[/runtime:{[DAYS:]HOURS:]MINUTES|infinite}

[/priority:{Highest|AboveNormal|Normal|BelowNormal|Lowest}

[/projectname:name]

[/askednodes:node1[,node2[,...]]

[/exclusive:{true | false}]

[/name:TaskName]

[/rerunnable:{true | false}]

[/checkpointable:{true | false}]

[/runtillcancelled:{true | false}]

[/stdin:file] [/stdout:file] [/stderr:file]

[/lic:feature1:amt1 /lic:feature2:amt2 ... /lic:featureN:amtN]

[/workdir:folder] command [arguments]

job submission mpi job
Job Submission: MPI job

C:\> job submit /numprocessors:4-8 mpiexec –hosts ^%CCP_NODES^% myapp.exe

C:\> job submit /numcpus:4-8 mytask.bat

mytask.bat

myjob.bat

# setup environment variables

set Path=“C:\program files\vendor\bin”

set LM_LICENSE_FILE=“c:\program files\vendor\license.bat”

# Stage the input files

. . .

# Invoke MPI

mpiexec –hosts %CCP_NODES% myapp.exe arg1 arg2 …

# Stage out the results

. . .

job submission parametric sweep
Job Submission: Parametric Sweep

# Create a job container

$str = `job new /numprocessors:4-8`;

if ($str =~ /ID: (\d+)/) {

$jobid = $1;

}

# add parametric tasks

for ($i = 0; $i < 128; $i++) {

`job add $jobid

/stdout:\\\\hn\\users\\foo\\output.$i

/stderr:\\\\hn\\users\\foo\\output.$i myapp.bat`;

}

# submit the job

`job submit /id:$jobid`;

job submission task flow
Job Submission: Task Flow

# create a job container

$str = `job new /numprocessors:4-8`;

if ($str =~ /ID: (\d+)/) {

$jobid = $1;

}

# add a set-up task

`job add $jobid /name:setup setup.bat`;

# all these tasks wait for the setup task to complete

for ($i = 0; $i < 128; $i++) {

`job add $jobid /name:compute /depend:setup compute.bat`;

}

# this task waits for all the “compute” tasks to complete

`job add $jobid /name:aggregate /depend:compute aggregate.bat`;

“setup”

“aggregate”

“compute”