computing facilities capabilities n.
Download
Skip this Video
Download Presentation
Computing Facilities & Capabilities

Loading in 2 Seconds...

play fullscreen
1 / 13

Computing Facilities & Capabilities - PowerPoint PPT Presentation


  • 100 Views
  • Uploaded on

Computing Facilities & Capabilities. Julian Borrill Computational Research Division, Berkeley Lab & Space Sciences Laboratory, UC Berkeley. Computing Issues. Data Volume Data Processing Data Storage Data Security Data Transfer Data Format/Layout Its all about the data. Data Volume.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Computing Facilities & Capabilities' - zamora


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
computing facilities capabilities

Computing Facilities & Capabilities

Julian Borrill

Computational Research Division, Berkeley Lab

& Space Sciences Laboratory, UC Berkeley

computing issues
Computing Issues
  • Data Volume
  • Data Processing
  • Data Storage
  • Data Security
  • Data Transfer
  • Data Format/Layout

Its all about the data

data volume
Data Volume
  • Planck data volume drives (almost) everything
    • LFI :
      • 22 detectors with 32.5, 45 & 76.8 Hz sampling
      • 4 x 1010 samples per year
      • 0.2 TB time-ordered data + 1.0 TB full detector pointing data
    • HFI :
      • 52 detectors with 200 Hz sampling
      • 3 x 1011 samples per year
      • 1.3 TB time-ordered data + 0.2 TB full boresight pointing data
    • LevelS (e.g. CTP “Trieste” simulations) :
      • 4 LFI detectors with 32.5 Hz sampling
      • 4 x 109 samples per year
      • 2 scans x 2 beams x 2 samplings x 7 components + 2 noises
      • 1.0 TB time-ordered data + 0.2 TB full detector pointing data
data processing
Data Processing
  • Operation count scales linearly (& inefficiently) with
    • # analyses, # realizations, # iterations, # samples
    • 100 x 100 x 100 x 100 x 1011 ~ O(10) Eflop (cf. '05 Day in the Life)
  • NERSC
    • Seaborg : 6080 CPU, 9 Tf/s
    • Jacquard : 712 CPU, 3 Tf/s (cf. Magique-II)
    • Bassi : 888 CPU, 7 Tf/s
    • NERSC-5 : O(100) Tf/s, first-byte in 2007
    • NERSC-6 : O(500) Tf/s, first-byte in 2010
    • Expect allocation of O(2 x 106) CPU-hours/year => O(4) Eflop/yr (10GHz CPUs @ 5% efficiency)
  • USPDC cluster
    • Specification & location TBD, first-byte in 2007/8
    • O(100) CPU x 80% x 9000 hours/year => O(0.4) Eflop/yr (5GHz CPUs @ 3% efficiency)
  • IPAC small cluster dedicated to ERCSC
processing
Processing

9 Tf/s NERSC Seaborg

3 Tf/s NERSC Jacquard

7 Tf/s NERSC Bassi

0.1 Tf/s ERCSC Cluster

0.5 Tf/s USPDC Cluster

100 Tf/s NERSC

5

(2007)

500 Tf/s NERSC

6

(2010)

data storage
Data Storage
  • Archive at IPAC
    • mission data
    • O(10) TB
  • Long-term at NERSC using HPSS
    • mission + simulation data & derivatives
    • O(2) PB
  • Spinning disk at USPDC cluster & at NERSC using NGF
    • current active data subset
    • O(2 - 20) TB
  • Processor memory at USPDC cluster & at NERSC
    • running job(s)
    • O(1 - 10+) GB/CPU & O(0.1 - 10) TB total
processing storage
Processing + Storage

9 Tf/s

6 TBNERSC Seaborg

2/20 PB

NERSC

HPSS

3 Tf/s

2 TBNERSC Jacquard

10 TB

IPAC

Archive

20/200 TB

NERSC

NGF

7 Tf/s

4 TB

NERSC Bassi

0.1 Tf/s

50 GBERCSC Cluster

2 TB

USPDC

Cluster

0.5 Tf/s

200 GB

USPDC Cluster

100 Tf/s

50 TB

NERSC-5

(2007)

500 Tf/s

250 TB

NERSC-6

(2010)

data security
Data Security
  • UNIX filegroups
    • special account : user planck
    • permissions _r__/___/___
  • Personal keyfob to access planck acount
    • real-time grid-certification of individuals
    • keyfobs issued & managed by IPAC
    • single system for IPAC, NERSC & USPDC cluster
  • Allows securing of selected data
    • e.g. mission vs simulation
  • Differentiates access to facilities and to data
    • standard personal account & special planck account
processing storage security
Processing + Storage + Security

PLANCK KEYFOB

REQUIRED

9 Tf/s

7 TB

NERSC Seaborg

2/20 PB

NERSC

HPSS

3 Tf/s

2 TB

NERSC Jacquard

10 TB

IPAC

Archive

20/200 TB

NERSC

NGF

7 Tf/s

4 TB

NERSC Bassi

0.1 Tf/s

50 GB

ERCSC Cluster

2 TB

USPDC

Cluster

0.5 Tf/s

200 GB

USPDC Cluster

100 Tf/s

50 TB

NERSC-5

(2007)

500 Tf/s

250 TB

NERSC-6

(2010)

data transfer
Data Transfer
  • From DPCs to IPAC
    • transatlantic tests being planned
  • From IPAC to NERSC
    • 10 Gb/s over Pacific Wave, CENIC + ESNet
    • tests planned this summer
  • From NGF to/from HPSS
    • 1 Gb/s being upgraded to 10+ Gb/s
  • From NGF to memory (most real-time critical)
    • within NERSC
      • 8-64 Gb/s depending on system (& support for this)
    • offsite depends on location
      • 10Gb/s to LBL over dedicated data link on Bay Area MAN
    • fallback exists : stage data on local scratch space
processing storage security networks
Processing + Storage + Security + Networks

PLANCK KEYFOB

REQUIRED

9 Tf/s

7 TB

NERSC Seaborg

2/20 PB

NERSC

HPSS

8 Gb/s

3 Tf/s

2 TBNERSC Jacquard

10 Gb/s

10 TB

IPAC

Archive

20/200 TB

NERSC

NGF

DPCs

10 Gb/s

?

10 Gb/s

10 Gb/s

7 Tf/s

4 TB

NERSC Bassi

?

?

?

?

64 Gb/s

0.1 Tf/s

50 GB

ERCSC Cluster

2 TB

USPDC

Cluster

0.5 Tf/s

200 GBUSPDC Cluster

100 Tf/s

50 TB

NERSC-5

(2007)

500 Tf/s

250 TB

NERSC-6

(2010)

?

project columbia update
Project Columbia Update
  • Last year we advertised our proposed use of NASA's new Project Columbia (5 x 2048 CPU, 5 x 12 Tf/s), potentially including a WAN-NGF.
  • We were successful in pushing for Ames' connection to the Bay Area MAN, providing a 10Gb/s dedicated data connect.
  • We were unsuccessful in making much use of Columbia:
    • disk read performance varies from poor to atrocious, effectively disabling data analysis (although simulation is possible).
    • foreign nationals are not welcome, even if they have passed JPL security screening !
  • We have provided feedback to Ames and HQ, but for now we are not pursuing this resource.
data formats
Data Formats
  • Once data are on disk they must be read by codes that do not know (or want to know) their format/layout:
    • to analyze LFI, HFI, LevelS, WMAP, etc data sets
      • both individually and collectively
    • to be able to operate on data while it is being read
      • e.g. weighted co-addition of simulation components
  • M3 provides a data abstraction layer to make this possible
  • Investment in M3 has paid huge dividends this year:
    • rapid (10 min) ingestion of new data formats, such as PIOLIB evolution and WMAP
    • rapid (1 month) development of interface to any compressed pointing, allowing on-the-fly interpolation & translation
    • immediate inheritance of improvements (new capabilities & optimization/tuning) by the growing number of M3-based codes