1 / 7

Using hpc

Using hpc. Instructor : Seung Hun An, DCS Lab, School of EECSE, Seoul National University. What is hpc. System IBM RS/6000 SP, Aix 4.3.3 9 nodes and 16 processors per node 144 Gbyte memory, 3TByte LoadLeveler & Poe LoadLeveler is recommanded hpc.snu.ac.kr Connect by telnet, ssh, rsh

kalil
Download Presentation

Using hpc

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using hpc Instructor : Seung Hun An, DCS Lab, School of EECSE, Seoul National University

  2. What is hpc • System • IBM RS/6000 SP, Aix 4.3.3 • 9 nodes and 16 processors per node • 144 Gbyte memory, 3TByte • LoadLeveler & Poe • LoadLeveler is recommanded • hpc.snu.ac.kr • Connect by telnet, ssh, rsh • Teratem is available at http://hpc.snu.ac.kr/download/ttermp23.zip

  3. System Setting & Using • Bourne shell • ksh(default), bash • Use export instead of setenv • General step of using • Edit cmd file • Compile source file • Submit machine code into the machine

  4. Command file #!/bin/ksh # @ job_type = parallel # @ executable = ~/KISA/LLL/execution # @ input = /dev/null # @ output = $(Executable).$(Cluster).$(Process).out # @ error = $(Executable).$(Cluster).$(Process).err # @ initialdir = /u/dcslab # @ notify_user = shahn@arirang.snu.ac.kr # @ class = gold # @ step_name = LLL # @ notification = complete # @ checkpoint = no # @ restart = no # @ requirements = (Arch == "R6000") && (OpSys == "AIX43") # @ node = 4 # @ total_tasks = 15 # @ network.MPI = css0,shared,US,high # @ queue

  5. Running example [sp01: ~/KISA/LLL] $ mpcc parallel_allswap.c [sp01: ~/KISA/LLL] $ mv a.out execution [sp01: ~/KISA/LLL] $ llsubmit lll.cmd llsubmit: The job "sp01.8681" has been submitted. [sp01: ~/KISA/LLL] $ llstatus Name Schedd InQ Act Startd Run LdAvg Idle Arch OpSys sp01 Avail 12 11 Idle 0 18.49 3 R6000 AIX43 sp02 Avail 1 1 Run 4 55.60 325 R6000 AIX43 sp03 Avail 0 0 Run 21 18.02 9999 R6000 AIX43 sp04 Avail 0 0 Run 16 12.23 9999 R6000 AIX43 sp05 Avail 0 0 Run 21 21.23 9999 R6000 AIX43 sp06 Avail 0 0 Run 16 9.00 9999 R6000 AIX43 sp07 Avail 0 0 Run 22 22.04 7200 R6000 AIX43 sp08 Avail 0 0 Run 6 2.02 9999 R6000 AIX43 sp09 Avail 0 0 Run 17 13.05 9999 R6000 AIX43 R6000/AIX43 9 machines 13 jobs 123 running Total Machines 9 machines 13 jobs 123 running The Central Manager is defined on sp02 All machines on the machine_list are present. [sp01: ~/KISA/LLL] $

  6. llq [sp01: ~/KISA/LLL] $ llq Id Owner Submitted ST PRI Class Running On ------------------------ ---------- ----------- -- --- ------------ sp01.8615.0 mrdlab1 9/14 03:04 R 50 long sp02 sp01.8648.0 spscs 9/15 16:16 R 50 silver sp05 sp01.8649.0 spscs 9/15 16:16 R 50 silver sp07 sp02.1291.0 flowsys1 9/15 17:00 R 50 silver sp04 sp01.8652.0 seongkim 9/15 22:37 R 50 gold sp06 sp01.8663.0 shinkj 9/16 12:11 R 50 gold sp04 sp01.8665.0 janggrp 9/16 12:28 R 50 gold sp09 sp01.8666.0 janggrp 9/16 12:28 R 50 gold sp03 sp01.8671.0 biosys 9/16 15:26 R 50 silver sp03 sp01.8678.0 hpcb0011 9/16 16:53 R 50 silver sp03 sp01.8679.0 microsys 9/16 17:25 R 50 silver sp08 sp01.8680.0 microsys 9/16 17:25 R 50 silver sp08 sp01.8681.0 dcslab 9/16 19:06 ST 50 gold sp08 13 job steps in queue, 0 waiting, 1 pending, 12 running, 0 held

  7. llclass & llcancel • llclass [sp01: ~/KISA/LLL] $ llclass Name MaxJobCPU MaxProcCPU Free Max Description d+hh:mm:ss d+hh:mm:ss Slots Slots gold -1 -1 52 112 Serial & parallel batch job silver -1 -1 68 112 Serial & parallel batch job long -1 -1 12 16 Long time job general -1 -1 16 16 Test or Interactive job • llcancel • When cancel one or more jobs from the Loadleveler queue

More Related