astronomy applications in the teragrid environment l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Astronomy Applications in the TeraGrid Environment PowerPoint Presentation
Download Presentation
Astronomy Applications in the TeraGrid Environment

Loading in 2 Seconds...

play fullscreen
1 / 59

Astronomy Applications in the TeraGrid Environment - PowerPoint PPT Presentation


  • 293 Views
  • Uploaded on

Astronomy Applications in the TeraGrid Environment Roy Williams, Caltech with thanks for material to: Sandra Bittner, ANL; Sharon Brunett, Caltech; Derek Simmel, PSC; John Towns, NCSA; Nancy Wilkins-Diehr, SDSC

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Astronomy Applications in the TeraGrid Environment' - benjamin


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
astronomy applications in the teragrid environment

Astronomy Applications in the TeraGrid Environment

Roy Williams, Caltech

with thanks for material to:Sandra Bittner, ANL; Sharon Brunett, Caltech; Derek Simmel, PSC; John Towns, NCSA; Nancy Wilkins-Diehr, SDSC

the teragrid vision distributing the resources is better than putting them at one site
The TeraGrid VisionDistributing the resources is better than putting them at one site
  • Build new, extensible, grid-based infrastructure to support grid-enabled scientific applications
    • New hardware, new networks, new software, new practices, new policies
  • Expand centers to support cyberinfrastructure
    • Distributed, coordinated operations center
    • Exploit unique partner expertise and resources to make whole greater than the sum of its parts
  • Leverage homogeneity to make the distributed computing easier and simplify initial development and standardization
    • Run single job across entire TeraGrid
    • Move executables between sites

NVO Summer School Sept 2004

what is grid really
What is Grid Really?
  • A set of powerful Beowulf clusters
    • Lots of disk storage
    • Fast interconnection
    • Unified account management
    • Interesting software
  • The Grid is not
    • Magic
    • Infinite
    • Simple
    • A universal panacea
    • The hype that you have read

NVO Summer School Sept 2004

grid as federation
Grid as Federation
  • Teragrid as a federation
    • independent centers

 flexibility

    • unified interface
      • power and strength
  • Large/small state compromise

NVO Summer School Sept 2004

teragrid wide area network
TeraGrid Wide Area Network

NVO Summer School Sept 2004

quasar science an nvo teragrid project pennstate cmu caltech
Quasar ScienceAn NVO-Teragrid projectPennState, CMU, Caltech
  • 60,000 quasar spectra from Sloan Sky Survey
  • Each is 1 cpu-hour: submit to grid queue
  • Fits complex model (173 parameter)

derive black hole mass from line widths

clusters

NVO data

services

globusrun

manager

NVO Summer School Sept 2004

n point galaxy correlation an nvo teragrid project pitt cmu
N-point galaxy correlationAn NVO-Teragrid projectPitt, CMU

Finding triple correlation in 3D SDSS galaxy catalog (RA/Dec/z)

Lots of large parallel jobs

kd-tree algorithms

NVO Summer School Sept 2004

palomar quest survey caltech ncsa yale
Palomar-Quest SurveyCaltech, NCSA, Yale

Transient pipeline

computing reservation at sunrise

for immediate followup of transients

Synoptic survey

massive resampling (Atlasmaker)

for ultrafaint detection

P48 Telescope

50 Gbyte/night

ALERT

Caltech

Yale

TG 

NCSA

NCSA and Caltech and Yale run different pipelines on the same data

5 Tbyte

NVO Summer School Sept 2004

transient from pq
Transient from PQ

from catalog pipeline

NVO Summer School Sept 2004

pq stacked images
PQ stacked images

from image pipeline

NVO Summer School Sept 2004

slide12

Wide-area Mosaicking (Hyperatlas)An NVO-Teragrid projectCaltech

DPOSS 15º

High-quality

flux-preserving, spatial accuracy

Stackable

Hyperatlas

Edge-free

Pyramid weight

Mining AND Outreach

Griffith Observatory "Big Picture"

NVO Summer School Sept 2004

teragrid components
TeraGrid Components
  • Compute hardware
    • Intel/Linux Clusters, Alpha SMP clusters, POWER4 cluster, …
  • Large-scale storage systems
    • hundreds of terabytes for secondary storage
  • Very high-speed network backbone
    • bandwidth for rich interaction and tight coupling
  • Grid middleware
    • Globus, data management, …
  • Next-generation applications

NVO Summer School Sept 2004

overview of distributed teragrid resources
Overview of Distributed TeraGrid Resources

Site Resources

Site Resources

HPSS

HPSS

External Networks

External Networks

Caltech

Argonne

External Networks

External Networks

NCSA/PACI

10.3 TF

240 TB

SDSC

4.1 TF

225 TB

Site Resources

Site Resources

HPSS

UniTree

NVO Summer School Sept 2004

compute resources ncsa 2 6 tf 10 6 tf w 230 tb
Compute Resources – NCSA2.6 TF  ~10.6 TF w/ 230 TB

30 Gbps to TeraGrid Network

GbE Fabric

8 TF Madison

667 nodes

2.6 TF Madison

256 nodes

Storage I/O

over Myrinet

and/or GbE

2p Madison

4 GB memory

2x73 GB

2p Madison

4 GB memory

2x73 GB

2p 1.3 GHz

4 or 12 GB

memory

73 GB scratch

2p Madison

4 GB memory

2x73 GB

250MB/s/node * 256 nodes

250MB/s/node * 670 nodes

256 2x FC

Myrinet Fabric

Brocade 12000 Switches

92 2x FC

Interactive+Spare

Nodes

230 TB

8 4p

Madison

Nodes

Login, FTP

NVO Summer School Sept 2004

compute resources sdsc 1 3 tf 4 3 1 1 tf w 500 tb
Compute Resources – SDSC1.3 TF  ~4.3 + 1.1 TF w/ 500 TB

30 Gbps to TeraGrid Network

GbE Fabric

3 TF Madison

256 nodes

1.3 TF Madison

128 nodes

2p Madison

4 GB memory

2x73 GB

2p 1.3 GHz

4 GB

memory

73 GB scratch

2p Madison

4 GB memory

2x73 GB

128 250MB/s

128 250MB/s

128 250MB/s

128 2x FC

128 2x FC

128 2x FC

Myrinet Fabric

Brocade 12000 Switches

256 2x FC

500 TB

Interactive+Spare

Nodes

6 4p

Madison

Nodes

Login, FTP

NVO Summer School Sept 2004

compute resources caltech 100 gf w 100 tb
Compute Resources – Caltech~ 100 GF w/ 100 TB

30 Gbps to TeraGrid Network

GbE Fabric

6 Opteron

nodes

33 IA32 storage nodes

100 TB /pvfs

72 GF Madison

36 IBM/Intel nodes

34 GF Madison

17 HP/Intel nodes

2p Madison

6 GB memory

2x73 GB

2p Madison

6 GB memory

73 GB scratch

2p ia32

6 GB memory

100 TB /pvfs

4p Opteron

8 GB memory

66 TB RAID5

HPSS Datawulf

2p Madison

6 GB memory

73 GB scratch

33 250MB/s

36 250MB/s

17 250MB/s

Myrinet Fabric

13 2xFC

2p IBM

Madison

Node

Interactive

Node

Login, FTP

13 Tape drives

1.2 PB silo raw capacity

NVO Summer School Sept 2004

wide variety of usage scenarios
Wide Variety of Usage Scenarios
  • Tightly coupled jobs storing vast amounts of data, performing visualization remotely as well as making data available through online collections (ENZO)
  • Thousands of independent jobs using data from a distributed data collection (NVO)
  • Science Gateways – "not a Unix prompt"!
    • from web browser with security
    • from application eg IRAF, IDL

NVO Summer School Sept 2004

traditional parallel processing
Traditional Parallel Processing
  • Single executables to be on a single remote machine
    • big assumptions
      • runtime necessities (e.g. executables, input files, shared objects) available on remote system!
    • login to a head node, choose a submission mechanism
  • Direct, interactive execution
    • mpirun –np 16 ./a.out
  • Through a batch job manager
    • qsub my_script
      • where my_script describes executable location, runtime duration, redirection of stdout/err, mpirun specification…

NVO Summer School Sept 2004

traditional parallel processing ii
Traditional Parallel Processing II
  • Through globus
    • globusrun -r [some-teragrid-head-node].teragrid.org/jobmanager -f my_rsl_script
      • where my_rsl_script describes the same details as in the qsub my_script!
  • Through Condor-G
    • condor_submit my_condor_script
      • where my_condor_script describes the same details as the globus my_rsl_script!

NVO Summer School Sept 2004

distributed parallel processing
Distributed Parallel Processing
  • Decompose application over geographically distributed resources
    • functional or domain decomposition fits well
    • take advantage of load balancing opportunities
    • think about latency impact
  • Improved utilization of a many resources
  • Flexible job management

NVO Summer School Sept 2004

pipelined dataflow processing
Pipelined/dataflow processing
  • Suited for problems which can be divided into a series of sequential tasks where
    • multiple instances of problem need executing
    • series of data needs processing with multiple operations on each series
    • information from one processing phase can be passed to next phase before current phase is complete

NVO Summer School Sept 2004

security
Security
  • ssh with password
      • Too much password-typing
      • Not very secure-- big break-in at TG April 04
        • One failure is a big failure
          • all TG!
      • Caltech and Argonne no longer allow this
      • SDSC does not allow password change

NVO Summer School Sept 2004

security27
Security
  • ssh with public key: single sign-on!
    • use ssh-keygen on Unix or puttykeygen on Windows
      • public key file (eg id_rsa.pub) AND
      • private key file (eg id_rsa) AND
      • passphrase
    • on remote machine, put public ke
      • .ssh/authorized_keys
    • on local machine, combine
      • private key and passphrase
      • ATM card model
    • On TG, can put public key on application form
      • immediate login, no snailmail

NVO Summer School Sept 2004

security28
Security
  • X.509 certificates: single sign-on!
    • from a Certificate Authority (eg verisign, US navy, DOE, etc etc)

It is:

      • Distinguished Name (DN) AND
        • /C=US/O=National Center for Supercomputing Applications/CN=Roy Williams
      • Private file (usercert.p12) AND
      • passphrase
    • Remote machine needs entry in gridmap file (maps DN to account)
      • use gx-map command
    • Can create certificate with ncsa-cert-request etc
    • Certificates can be lodged in web browser

NVO Summer School Sept 2004

3 ways to submit a job
3 Ways to Submit a Job

1. Directly to PBS Batch Scheduler

  • Simple, scripts are portable among PBS TeraGrid clusters

2. Globus common batch script syntax

  • Scripts are portable among other grids using Globus

3. Condor-G

  • Nice interface atop Globus, monitoring of all jobs submitted via Condor-G
  • Higher-level tools like DAGMan

NVO Summer School Sept 2004

pbs batch submission
PBS Batch Submission

ssh tg-login.[caltech|ncsa|sdsc|uc].teragrid.org

  • qsub flatten.sh –v "FILE=f544"
  • qstat or showq
  • ls *.dat
  • pbs.out, pbs.err files

NVO Summer School Sept 2004

globus job submit
globus-job-submit
  • For running of batch/offline jobs
    • globus-job-submit Submit job
      • same interface as globus-job-run
      • returns immediately
    • globus-job-status Check job status
    • globus-job-cancel Cancel job
    • globus-job-get-output Get job stdout/err
    • globus-job-clean Cleanup after job

NVO Summer School Sept 2004

condor g job submission
Condor-G Job Submission

mickey.disney.edu

tg-login.sdsc.teragrid.org

Globus API

Globus job manager

Condor-G

executable=/wd/doit

universe=globus

globusscheduler=<…>

globusrsl=(maxtime=10)

queue

PBS

NVO Summer School Sept 2004

condor g
Condor-G
  • Combines the strengths of Condor

and the Globus Toolkit

  • Advantages when managing grid jobs
    • full featured queuing service
    • credential management
    • fault-tolerance
    • DAGman (== pipelines)

NVO Summer School Sept 2004

condor dagman
Condor DAGMan
  • Manages workflow interdependencies
  • Each task is a Condor description file
  • A DAG file controls the order in which the Condor files are run

NVO Summer School Sept 2004

where s the disk
Where’s the disk
  • Home directory
    • $TG_CLUSTER_HOME
      • example /home/roy
  • Shared writeable global areas
    • $TG_CLUSTER_PFS
      • example /pvfs/MCA04N009/roy

NVO Summer School Sept 2004

gridftp
GridFtp
  • Moving a Test File

% globus-url-copy "`grid-cert-info -subject`" \ gsiftp://localhost:5678/tmp/file1 \ file:///tmp/file2

  • Also uberftp and scp

NVO Summer School Sept 2004

storage resource broker srb
Storage Resource Broker (SRB)
  • Single logical namespace while accessing distributed archival storage resources
  • Effectively infinite storage (first to 1TB wins a t-shirt)
  • Data replication
  • Parallel Transfers
  • Interfaces: command-line, API, web/portal.

NVO Summer School Sept 2004

storage resource broker srb virtual resources replication

hpss-sdsc

sfs-tape-sdsc

hpss-caltech

workstation

Storage Resource Broker (SRB):Virtual Resources, Replication

NCSA

SDSC

SRB Client (cmdline,

or API)

NVO Summer School Sept 2004

allocations policies
Allocations Policies
  • TG resources allocated via the PACI allocations and review process
    • modeled after NSF process
    • TG considered as single resource for grid allocations
  • Different levels of review for different size allocation requests
    • DAC: up to 10,000
    • PRAC/AAB: <200,000 SUs/year
    • NRAC: 200,000+ SUs/year
  • Policies/procedures posted at:

http://www.paci.org/Allocations.html

  • Proposal submission through the PACI On-Line Proposal System (POPS)

https://pops-submit.paci.org/

minimal review, fast turnaround

NVO Summer School Sept 2004

24 7 consulting support
24/7 Consulting Support
  • help@teragrid.org
    • advanced ticketing system for cross-site support
    • staffed 24/7
    • 866-336-2357, 9-5 Pacific Time
  • http://news.teragrid.org/
  • Extensive experience solving problems for early access users
  • Networking, compute resources, extensible TeraGrid resources

NVO Summer School Sept 2004

links
Links
  • www.teragrid.org/userinfo
  • getting an account
  • help@teragrid.org
  • news.teragrid.org
  • site monitors

NVO Summer School Sept 2004

dposs flattening
DPOSS flattening

Source

Target

2650 x 1.1 Gbyte files

Cropping borders

Quadratic fit and subtract

Virtual data

NVO Summer School Sept 2004

driving the queues
Driving the Queues

for f in os.listdir(inputDirectory):

# if the file exists, with the right size and age, then we keep it

ofile = outputDirectory +"/"+ f

if os.path.exists(ofile):

osize = os.path.getsize(ofile)

if osize != 1109404800:

print " -- wrong target size, remaking", osize

else:

time_tgt = filetime(ofile)

time_src = filetime(file)

if time_tgt < time_src:

print(" -- target too old or nonexistant, making")

else:

print " -- already have target file "

continue

cmd = "qsub flat.sh -v \"FILE=" + f +"\""

print " -- submitting batch job: ", cmd

os.system(cmd)

Here is the driver that makes and submits jobs

NVO Summer School Sept 2004

pbs script
PBS script

A PBS script. Can do "qsub script.sh –v "FILE=f345"

#!/bin/sh

#PBS -N dposs

#PBS -V

#PBS -l nodes=1

#PBS -l walltime=1:00:00

cd /home/roy/dposs-flat/flat

./flat \

-infile /pvfs/mydata/source/${FILE}.fits \

-outfile /pvfs/mydata/target/${FILE}.fits \

-chop 0 0 1500 23552 \

-chop 0 0 23552 1500 \

-chop 0 22052 23552 23552 \

-chop 22052 0 23552 23552 \

-chop 18052 0 23552 4000

NVO Summer School Sept 2004

atlasmaker a service oriented application on teragrid
Atlasmakera service-oriented applicationon Teragrid

Federated Images:

wavelength, time, ...

VO Registry

SIAP

SWarp

Hyperatlas

source detection

average/max

subtraction

NVO Summer School Sept 2004

hyperatlas

TAN projection

SIN projection

Hyperatlas

Standard naming for atlases and pages

TM-5-SIN-20

Page 1589

Standard Scales:

scale s means

220-s arcseconds

per pixel

Standard

Layout

TM-5 layout

Standard

Projections

HV-4 layout

NVO Summer School Sept 2004

hyperatlas is a service
Hyperatlas is a Service

All Pages: <baseURL>/getChart?atlas=TM-5-SIN-20

0 2.77777778E-4 'RA---SIN’ 'DEC--SIN' 0.0 -90.0

1 2.77777778E-4 'RA---SIN‘ 'DEC--SIN' 0.0 -85.0

2 2.77777778E-4 'RA---SIN‘ 'DEC--SIN' 36.0 -85.0

...

1731 2.77777778E-4 'RA---SIN‘ 'DEC--SIN' 288.0 85.0

1732 2.77777778E-4 'RA---SIN‘ 'DEC--SIN' 324.0 85.0

1733 2.77777778E-4 'RA---SIN‘ 'DEC--SIN' 0.0 90.0

Best Page: <baseURL>/getChart?atlas=TM-5-SIN-20&RA=182&Dec=62

1604 2.77777778E-4 'RA---SIN‘ 'DEC--SIN' 184.61538 60.0

Numbered Page: <baseURL>/getChart?atlas=TM-5-SIN-20&page=1604

1604 2.77777778E-4 'RA---SIN' 'DEC--SIN' 184.61538 60.0

Replicated Implementations

baseURL = http://mercury.cacr.caltech.edu:8080/hyperatlas (try services)

baseURL = http://virtualsky.org/servlet

NVO Summer School Sept 2004

get services from python
GET services from Python

This code uses a service to find the best hyperatlas page for a given sky location

hyperatlasURL = self.hyperatlasServer + "/getChart?atlas=" + atlas \

+ "&RA=" + str(center1) + "&Dec=" + str(center2)

stream = urllib.urlopen(hyperatlasURL)

# result is a tab-separated line, so use split() to tokenize

tokens = stream.readline().split('\t')

print "Using page ", tokens[0], " of atlas ", atlas

self.scale = float(tokens[1])

self.CTYPE1 = tokens[2]

self.CTYPE2 = tokens[3]

rval1 = float(tokens[4])

rval2 = float(tokens[5])

NVO Summer School Sept 2004

votable parser in python
VOTable parser in Python

From a SIAP URL, we get the XML, and extract the columns that have the image references, image format, and image RA/Dec

stream = urllib.urlopen(SIAP_URL)

doc = xml.dom.minidom.parse(stream)

#Make a dictionary for the columns

col_ucd_dict = {}

for XML_TABLE in doc.getElementsByTagName("TABLE"):

for XML_FIELD in XML_TABLE.getElementsByTagName("FIELD"):

col_ucd = XML_FIELD.getAttribute("ucd")

col_ucd_dict[col_title] = col_counter

urlColumn = col_ucd_dict["VOX:Image_AccessReference"]

formatColumn = col_ucd_dict["VOX:Image_Format"]

raColumn = col_ucd_dict["POS_EQ_RA_MAIN"]

deColumn = col_ucd_dict["POS_EQ_DEC_MAIN"]

(need exception catching here)

NVO Summer School Sept 2004

votable parser in python52
VOTable parser in Python

Table is a list of rows, and each row is a list of table cells

table=[]

for XML_TABLE in doc.getElementsByTagName("TABLE"):

for XML_DATA in XML_TABLE.getElementsByTagName("DATA"):

for XML_TABLEDATA in XML_DATA.getElementsByTagName("TABLEDATA"):

for XML_TR in XML_TABLEDATA.getElementsByTagName("TR"):

row=[]

for XML_TD in XML_TR.getElementsByTagName("TD"):

data = ""

for child in XML_TD.childNodes:

data += child.data

row.append(data)

table.append(row)

NVO Summer School Sept 2004

soap client in python
SOAP client in Python

WCSTools (xy2sky and sky2xy) as web services

from SOAPpy import *

# get fitsheader string as FITS header

# get x1, x2 as coordinates on image

server = SOAPProxy("http://mercury.cacr.caltech.edu:9091")

wcsR = server.xy2sky(fitsheader, x1, x2)

ra = wcsR["c1"]

dec = wcsR["c2"]

status = wcsR["status"]

message = wcsR["message"]

print "Sky coordinates are:", ra, dec

print "status is: ", status

print "Message is: ", message

NVO Summer School Sept 2004

future science gateways
Future: Science Gateways

NVO Summer School Sept 2004

teragrid impediments
Teragrid Impediments

and now do some science....

Learn Globus

Learn MPI

Learn PBS

Port code to Itanium

Get certificate

Get logged in

Wait 3 months for account

Write proposal

NVO Summer School Sept 2004

a better way graduated security for science gateways
A better way:Graduated Securityfor Science Gateways

power user

Write proposal

- own account

big-ironcomputing....

Authenticate X.509

- browser or cmd line

morescience....

Register - logging and reporting

somescience....

Web form - anonymous

NVO Summer School Sept 2004

secure web services for teragrid access
Secure Web servicesfor Teragrid Access

web form

(browser has

certificate)

Clarens

BOSS

PBS

Gridport

Xforms

Embedded in existing

client application

(Root, IRAF, IDL, ...)

auto-generated client API

for scripted submission

(certificate in .globus/)

distribute jobs on grid

Embedded as part of other service

(proxy agent)

NVO Summer School Sept 2004

slide58

Secure Web servicesfor Teragrid Access

Shell command

List files, get files

Submit job to TG queue

(Condor / Dagman / globusrun)

Monitor running jobs

NVO Summer School Sept 2004

teragrid wants you
Teragrid Wants YOU!
  • Your astronomy applications
  • Your science gateway projects
  • Teragrid has 100's of processors and 100's of terabytes

Talk To Me!

NVO Summer School Sept 2004