Bsp on the origin2000
Download
1 / 29

BSP on the Origin2000 Lab for the course: - PowerPoint PPT Presentation


  • 426 Views
  • Uploaded on

BSP on the Origin2000 Lab for the course: Seminar in Scientific Computing with BSP Dr. Anne Weill – [email protected] ,ph:4997 Origin2000 (SGI) 32 processors Origin2000/3000 architecture features Important hardware and software components: * node board: processors + memory

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'BSP on the Origin2000 Lab for the course:' - Faraday


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Bsp on the origin2000 l.jpg

BSP on the Origin2000

Lab for the course:

Seminar in Scientific Computing with BSP

Dr. Anne Weill – [email protected] ,ph:4997


Slide2 l.jpg

Origin2000 (SGI)

32 processors


Slide3 l.jpg

Origin2000/3000 architecture features

Important hardware and software components:

* node board: processors + memory

* node interconnect topology and configurations

* scalability of the architecture

* directory-based cache coherency

* single system image components





Slide7 l.jpg

Origin2000 interconnect

32 processors

64 processors


Slide8 l.jpg

Origin router interconnect

- Router chip has 6 CrayLink interfaces: 2 for connections to

nodes (HUBs) and 4 for connections to other routers in the

network

* 4-dimensional interconnect

- Router links are point-to-point connections 17+7 wires @

400 MHz (that is, wire speed 800 MB/s)

- Worm hole routing with static routing table loaded at boot

- Router delay is 50 ns in one direction

- The interconnect topology is determined by the size of the

computer (number of nodes):

* direct (back-to-back) connection for 2 nodes (4 cpu)

* strongly connected cube up to 32 cpu

* hypercube for up to 64 cpu

* hypercube of hypercubes for up to 256 cpu


Slide9 l.jpg

Origin address space

- Physically the memory is distributed and not contiguous

- Node id is assigned at boot time

- Logically memory is a shared single contiguous address space,

the virtual address space is 44 bits (16 TB)

- A program (compiler) uses

the virtual address space

- CPU translates from virtual to

physical address space

39 32 31 0

node id 8 bits

Node offset 32 bits (4 GB)

Empty slot

page

0

1

2

n

Physical

k

1

n

0

Memory present

0 1 2 3 .. Node id

Virtual

TLB

TLB – Translation Look-aside Buffer


Login to carmel l.jpg
Login to carmel

1. Open an ssh window to :

carmel.technion.ac.il

2. Username : course01-course20

Password : bsp2006

Contact : Dr. Anne Weill – [email protected] ,

phone :4997


Compiling and running codes l.jpg
Compiling and running codes

  • Setting path

    set path=($path /u/tcc/anne/BSP/bin)

    2. Compiling

    %bspcc prog1.c –o prog1

    %bspcc –flibrary-level 1 prog1.c –o prog1

    (for non-dedicated machine)

    3. Running

    %bsprun –npes 4 prog1


Running on carmel l.jpg
Running on carmel

  • Interactive mode :

    % ./prog.exe <parameters>

    2. NQE queues:

    % qsub –q qcourse script.bat





How it works l.jpg
How it works

P0

Prog.exe

P1

Prog.exe

bsprun

P2

Prog.exe

P3

Prog.exe


Spmd single program multiple data l.jpg
SPMD – single program multiple data

  • Each processor views only its local memory.

  • Contents of variable X are different in different processors.

  • Transfer of data can occur in principle through one-sided or two-sided communication.


Drma direct remote memory access l.jpg
DRMA- direct remote memory access

  • All processors must register the space into which remote “read” and “write” will happen

  • Calls to bsp_put

  • Calls to bsp_get

  • Call to bsp_sync – all processors synchronize, all communication is completed after the call



Running on carmel20 l.jpg
Running on carmel

  • Interactive mode :

    % ./prog.exe <parameters>

    2. NQE queues:

    % qsub –q qcourse script.bat




Another example l.jpg
Another example

*What does the following program ?

  • What will the program print ?



Another example26 l.jpg
Another example

* Is there a problem with the following example?

  • What will the program print ?


Answer l.jpg
Answer

  • As it is written, the program will not print any output : the data is actually transferred only after the bsp_sync statement

  • Additional question : what will the program print if bsp_sync is placed right after the put statement?

  • NB : the programs are in directory /u/tcc/anne/BSPcourse, under prog2.c and prog2wrong.c – try them


Exercise1 due nov 26d 2006 l.jpg
Exercise1 (due Nov. 26d 2006)

  • Copy over to your directory the directory: /u/tcc/anne/BSPcourse. Take a look at the bspedupack.h file.

  • Write a C program in which each processor writes its pid into an array PIDS(0:p-1) on p0. (PIDS(i)=i).

  • Run the program for p=1,2,4,8,16 processors and print PIDS. You can run it interactively.

  • Same with a get instruction.


ad