
Astronomy Applications in the TeraGrid Environment Roy Williams, Caltech with thanks for material to: Sandra Bittner, ANL; Sharon Brunett, Caltech; Derek Simmel, PSC; John Towns, NCSA; Nancy Wilkins-Diehr, SDSC
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Roy Williams, Caltech
with thanks for material to:Sandra Bittner, ANL; Sharon Brunett, Caltech; Derek Simmel, PSC; John Towns, NCSA; Nancy Wilkins-Diehr, SDSC
NVO Summer School Sept 2004
NVO Summer School Sept 2004
flexibility
NVO Summer School Sept 2004
NVO Summer School Sept 2004
derive black hole mass from line widths
clusters
NVO data
services
globusrun
manager
NVO Summer School Sept 2004
Finding triple correlation in 3D SDSS galaxy catalog (RA/Dec/z)
Lots of large parallel jobs
kd-tree algorithms
NVO Summer School Sept 2004
Transient pipeline
computing reservation at sunrise
for immediate followup of transients
Synoptic survey
massive resampling (Atlasmaker)
for ultrafaint detection
P48 Telescope
50 Gbyte/night
ALERT
Caltech
Yale
TG
NCSA
NCSA and Caltech and Yale run different pipelines on the same data
5 Tbyte
NVO Summer School Sept 2004
Wide-area Mosaicking (Hyperatlas)An NVO-Teragrid projectCaltech
DPOSS 15º
High-quality
flux-preserving, spatial accuracy
Stackable
Hyperatlas
Edge-free
Pyramid weight
Mining AND Outreach
Griffith Observatory "Big Picture"
NVO Summer School Sept 2004
2MASS Mosaicking portalAn NVO-Teragrid projectCaltech IPAC
NVO Summer School Sept 2004
NVO Summer School Sept 2004
Site Resources
Site Resources
HPSS
HPSS
External Networks
External Networks
Caltech
Argonne
External Networks
External Networks
NCSA/PACI
10.3 TF
240 TB
SDSC
4.1 TF
225 TB
Site Resources
Site Resources
HPSS
UniTree
NVO Summer School Sept 2004
30 Gbps to TeraGrid Network
GbE Fabric
8 TF Madison
667 nodes
2.6 TF Madison
256 nodes
Storage I/O
over Myrinet
and/or GbE
2p Madison
4 GB memory
2x73 GB
2p Madison
4 GB memory
2x73 GB
2p 1.3 GHz
4 or 12 GB
memory
73 GB scratch
2p Madison
4 GB memory
2x73 GB
250MB/s/node * 256 nodes
250MB/s/node * 670 nodes
256 2x FC
Myrinet Fabric
Brocade 12000 Switches
92 2x FC
Interactive+Spare
Nodes
230 TB
8 4p
Madison
Nodes
Login, FTP
NVO Summer School Sept 2004
30 Gbps to TeraGrid Network
GbE Fabric
3 TF Madison
256 nodes
1.3 TF Madison
128 nodes
2p Madison
4 GB memory
2x73 GB
2p 1.3 GHz
4 GB
memory
73 GB scratch
2p Madison
4 GB memory
2x73 GB
128 250MB/s
128 250MB/s
128 250MB/s
128 2x FC
128 2x FC
128 2x FC
Myrinet Fabric
Brocade 12000 Switches
256 2x FC
500 TB
Interactive+Spare
Nodes
6 4p
Madison
Nodes
Login, FTP
NVO Summer School Sept 2004
30 Gbps to TeraGrid Network
GbE Fabric
6 Opteron
nodes
33 IA32 storage nodes
100 TB /pvfs
72 GF Madison
36 IBM/Intel nodes
34 GF Madison
17 HP/Intel nodes
2p Madison
6 GB memory
2x73 GB
2p Madison
6 GB memory
73 GB scratch
2p ia32
6 GB memory
100 TB /pvfs
4p Opteron
8 GB memory
66 TB RAID5
HPSS Datawulf
2p Madison
6 GB memory
73 GB scratch
33 250MB/s
36 250MB/s
17 250MB/s
Myrinet Fabric
13 2xFC
2p IBM
Madison
Node
Interactive
Node
Login, FTP
13 Tape drives
1.2 PB silo raw capacity
NVO Summer School Sept 2004
NVO Summer School Sept 2004
NVO Summer School Sept 2004
NVO Summer School Sept 2004
NVO Summer School Sept 2004
NVO Summer School Sept 2004
NVO Summer School Sept 2004
NVO Summer School Sept 2004
It is:
NVO Summer School Sept 2004
1. Directly to PBS Batch Scheduler
2. Globus common batch script syntax
3. Condor-G
NVO Summer School Sept 2004
ssh tg-login.[caltech|ncsa|sdsc|uc].teragrid.org
NVO Summer School Sept 2004
NVO Summer School Sept 2004
mickey.disney.edu
tg-login.sdsc.teragrid.org
Globus API
Globus job manager
Condor-G
executable=/wd/doit
universe=globus
globusscheduler=<…>
globusrsl=(maxtime=10)
queue
PBS
NVO Summer School Sept 2004
and the Globus Toolkit
NVO Summer School Sept 2004
NVO Summer School Sept 2004
NVO Summer School Sept 2004
% globus-url-copy "`grid-cert-info -subject`" \ gsiftp://localhost:5678/tmp/file1 \ file:///tmp/file2
NVO Summer School Sept 2004
NVO Summer School Sept 2004
sfs-tape-sdsc
hpss-caltech
workstation
Storage Resource Broker (SRB):Virtual Resources, ReplicationNCSA
SDSC
SRB Client (cmdline,
or API)
…
NVO Summer School Sept 2004
http://www.paci.org/Allocations.html
https://pops-submit.paci.org/
minimal review, fast turnaround
NVO Summer School Sept 2004
NVO Summer School Sept 2004
NVO Summer School Sept 2004
Source
Target
2650 x 1.1 Gbyte files
Cropping borders
Quadratic fit and subtract
Virtual data
NVO Summer School Sept 2004
for f in os.listdir(inputDirectory):
# if the file exists, with the right size and age, then we keep it
ofile = outputDirectory +"/"+ f
if os.path.exists(ofile):
osize = os.path.getsize(ofile)
if osize != 1109404800:
print " -- wrong target size, remaking", osize
else:
time_tgt = filetime(ofile)
time_src = filetime(file)
if time_tgt < time_src:
print(" -- target too old or nonexistant, making")
else:
print " -- already have target file "
continue
cmd = "qsub flat.sh -v \"FILE=" + f +"\""
print " -- submitting batch job: ", cmd
os.system(cmd)
Here is the driver that makes and submits jobs
NVO Summer School Sept 2004
A PBS script. Can do "qsub script.sh –v "FILE=f345"
#!/bin/sh
#PBS -N dposs
#PBS -V
#PBS -l nodes=1
#PBS -l walltime=1:00:00
cd /home/roy/dposs-flat/flat
./flat \
-infile /pvfs/mydata/source/${FILE}.fits \
-outfile /pvfs/mydata/target/${FILE}.fits \
-chop 0 0 1500 23552 \
-chop 0 0 23552 1500 \
-chop 0 22052 23552 23552 \
-chop 22052 0 23552 23552 \
-chop 18052 0 23552 4000
NVO Summer School Sept 2004
Federated Images:
wavelength, time, ...
VO Registry
SIAP
SWarp
Hyperatlas
source detection
average/max
subtraction
NVO Summer School Sept 2004
SIN projection
HyperatlasStandard naming for atlases and pages
TM-5-SIN-20
Page 1589
Standard Scales:
scale s means
220-s arcseconds
per pixel
Standard
Layout
TM-5 layout
Standard
Projections
HV-4 layout
NVO Summer School Sept 2004
All Pages: <baseURL>/getChart?atlas=TM-5-SIN-20
0 2.77777778E-4 'RA---SIN’ 'DEC--SIN' 0.0 -90.0
1 2.77777778E-4 'RA---SIN‘ 'DEC--SIN' 0.0 -85.0
2 2.77777778E-4 'RA---SIN‘ 'DEC--SIN' 36.0 -85.0
...
1731 2.77777778E-4 'RA---SIN‘ 'DEC--SIN' 288.0 85.0
1732 2.77777778E-4 'RA---SIN‘ 'DEC--SIN' 324.0 85.0
1733 2.77777778E-4 'RA---SIN‘ 'DEC--SIN' 0.0 90.0
Best Page: <baseURL>/getChart?atlas=TM-5-SIN-20&RA=182&Dec=62
1604 2.77777778E-4 'RA---SIN‘ 'DEC--SIN' 184.61538 60.0
Numbered Page: <baseURL>/getChart?atlas=TM-5-SIN-20&page=1604
1604 2.77777778E-4 'RA---SIN' 'DEC--SIN' 184.61538 60.0
Replicated Implementations
baseURL = http://mercury.cacr.caltech.edu:8080/hyperatlas (try services)
baseURL = http://virtualsky.org/servlet
NVO Summer School Sept 2004
This code uses a service to find the best hyperatlas page for a given sky location
hyperatlasURL = self.hyperatlasServer + "/getChart?atlas=" + atlas \
+ "&RA=" + str(center1) + "&Dec=" + str(center2)
stream = urllib.urlopen(hyperatlasURL)
# result is a tab-separated line, so use split() to tokenize
tokens = stream.readline().split('\t')
print "Using page ", tokens[0], " of atlas ", atlas
self.scale = float(tokens[1])
self.CTYPE1 = tokens[2]
self.CTYPE2 = tokens[3]
rval1 = float(tokens[4])
rval2 = float(tokens[5])
NVO Summer School Sept 2004
From a SIAP URL, we get the XML, and extract the columns that have the image references, image format, and image RA/Dec
stream = urllib.urlopen(SIAP_URL)
doc = xml.dom.minidom.parse(stream)
#Make a dictionary for the columns
col_ucd_dict = {}
for XML_TABLE in doc.getElementsByTagName("TABLE"):
for XML_FIELD in XML_TABLE.getElementsByTagName("FIELD"):
col_ucd = XML_FIELD.getAttribute("ucd")
col_ucd_dict[col_title] = col_counter
urlColumn = col_ucd_dict["VOX:Image_AccessReference"]
formatColumn = col_ucd_dict["VOX:Image_Format"]
raColumn = col_ucd_dict["POS_EQ_RA_MAIN"]
deColumn = col_ucd_dict["POS_EQ_DEC_MAIN"]
(need exception catching here)
NVO Summer School Sept 2004
Table is a list of rows, and each row is a list of table cells
table=[]
for XML_TABLE in doc.getElementsByTagName("TABLE"):
for XML_DATA in XML_TABLE.getElementsByTagName("DATA"):
for XML_TABLEDATA in XML_DATA.getElementsByTagName("TABLEDATA"):
for XML_TR in XML_TABLEDATA.getElementsByTagName("TR"):
row=[]
for XML_TD in XML_TR.getElementsByTagName("TD"):
data = ""
for child in XML_TD.childNodes:
data += child.data
row.append(data)
table.append(row)
NVO Summer School Sept 2004
WCSTools (xy2sky and sky2xy) as web services
from SOAPpy import *
# get fitsheader string as FITS header
# get x1, x2 as coordinates on image
server = SOAPProxy("http://mercury.cacr.caltech.edu:9091")
wcsR = server.xy2sky(fitsheader, x1, x2)
ra = wcsR["c1"]
dec = wcsR["c2"]
status = wcsR["status"]
message = wcsR["message"]
print "Sky coordinates are:", ra, dec
print "status is: ", status
print "Message is: ", message
NVO Summer School Sept 2004
NVO Summer School Sept 2004
and now do some science....
Learn Globus
Learn MPI
Learn PBS
Port code to Itanium
Get certificate
Get logged in
Wait 3 months for account
Write proposal
NVO Summer School Sept 2004
power user
Write proposal
- own account
big-ironcomputing....
Authenticate X.509
- browser or cmd line
morescience....
Register - logging and reporting
somescience....
Web form - anonymous
NVO Summer School Sept 2004
web form
(browser has
certificate)
Clarens
BOSS
PBS
Gridport
Xforms
Embedded in existing
client application
(Root, IRAF, IDL, ...)
auto-generated client API
for scripted submission
(certificate in .globus/)
distribute jobs on grid
Embedded as part of other service
(proxy agent)
NVO Summer School Sept 2004
Secure Web servicesfor Teragrid Access
Shell command
List files, get files
Submit job to TG queue
(Condor / Dagman / globusrun)
Monitor running jobs
NVO Summer School Sept 2004
Talk To Me!
NVO Summer School Sept 2004