Bsp on the origin2000
Download
1 / 29

BSP on the Origin2000 - PowerPoint PPT Presentation


  • 430 Views
  • Uploaded on

BSP on the Origin2000 Lab for the course: Seminar in Scientific Computing with BSP Dr. Anne Weill – anne@tx.technion.ac.il ,ph:4997 Origin2000 (SGI) 32 processors Origin2000/3000 architecture features Important hardware and software components: * node board: processors + memory

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'BSP on the Origin2000' - Faraday


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Bsp on the origin2000 l.jpg

BSP on the Origin2000

Lab for the course:

Seminar in Scientific Computing with BSP

Dr. Anne Weill – anne@tx.technion.ac.il ,ph:4997


Slide2 l.jpg

Origin2000 (SGI)

32 processors


Slide3 l.jpg

Origin2000/3000 architecture features

Important hardware and software components:

* node board: processors + memory

* node interconnect topology and configurations

* scalability of the architecture

* directory-based cache coherency

* single system image components





Slide7 l.jpg

Origin2000 interconnect

32 processors

64 processors


Slide8 l.jpg

Origin router interconnect

- Router chip has 6 CrayLink interfaces: 2 for connections to

nodes (HUBs) and 4 for connections to other routers in the

network

* 4-dimensional interconnect

- Router links are point-to-point connections 17+7 wires @

400 MHz (that is, wire speed 800 MB/s)

- Worm hole routing with static routing table loaded at boot

- Router delay is 50 ns in one direction

- The interconnect topology is determined by the size of the

computer (number of nodes):

* direct (back-to-back) connection for 2 nodes (4 cpu)

* strongly connected cube up to 32 cpu

* hypercube for up to 64 cpu

* hypercube of hypercubes for up to 256 cpu


Slide9 l.jpg

Origin address space

- Physically the memory is distributed and not contiguous

- Node id is assigned at boot time

- Logically memory is a shared single contiguous address space,

the virtual address space is 44 bits (16 TB)

- A program (compiler) uses

the virtual address space

- CPU translates from virtual to

physical address space

39 32 31 0

node id 8 bits

Node offset 32 bits (4 GB)

Empty slot

page

0

1

2

n

Physical

k

1

n

0

Memory present

0 1 2 3 .. Node id

Virtual

TLB

TLB – Translation Look-aside Buffer


Login to carmel l.jpg
Login to carmel

1. Open an ssh window to :

carmel.technion.ac.il

2. Username : course01-course20

Password : bsp2006

Contact : Dr. Anne Weill – anne@tx.technion.ac.il ,

phone :4997


Compiling and running codes l.jpg
Compiling and running codes

  • Setting path

    set path=($path /u/tcc/anne/BSP/bin)

    2. Compiling

    %bspcc prog1.c –o prog1

    %bspcc –flibrary-level 1 prog1.c –o prog1

    (for non-dedicated machine)

    3. Running

    %bsprun –npes 4 prog1


Running on carmel l.jpg
Running on carmel

  • Interactive mode :

    % ./prog.exe <parameters>

    2. NQE queues:

    % qsub –q qcourse script.bat





How it works l.jpg
How it works

P0

Prog.exe

P1

Prog.exe

bsprun

P2

Prog.exe

P3

Prog.exe


Spmd single program multiple data l.jpg
SPMD – single program multiple data

  • Each processor views only its local memory.

  • Contents of variable X are different in different processors.

  • Transfer of data can occur in principle through one-sided or two-sided communication.


Drma direct remote memory access l.jpg
DRMA- direct remote memory access

  • All processors must register the space into which remote “read” and “write” will happen

  • Calls to bsp_put

  • Calls to bsp_get

  • Call to bsp_sync – all processors synchronize, all communication is completed after the call



Running on carmel20 l.jpg
Running on carmel

  • Interactive mode :

    % ./prog.exe <parameters>

    2. NQE queues:

    % qsub –q qcourse script.bat




Another example l.jpg
Another example

*What does the following program ?

  • What will the program print ?



Another example26 l.jpg
Another example

* Is there a problem with the following example?

  • What will the program print ?


Answer l.jpg
Answer

  • As it is written, the program will not print any output : the data is actually transferred only after the bsp_sync statement

  • Additional question : what will the program print if bsp_sync is placed right after the put statement?

  • NB : the programs are in directory /u/tcc/anne/BSPcourse, under prog2.c and prog2wrong.c – try them


Exercise1 due nov 26d 2006 l.jpg
Exercise1 (due Nov. 26d 2006)

  • Copy over to your directory the directory: /u/tcc/anne/BSPcourse. Take a look at the bspedupack.h file.

  • Write a C program in which each processor writes its pid into an array PIDS(0:p-1) on p0. (PIDS(i)=i).

  • Run the program for p=1,2,4,8,16 processors and print PIDS. You can run it interactively.

  • Same with a get instruction.