Storage resource broker
Download
1 / 33

Storage Resource Broker - PowerPoint PPT Presentation


  • 354 Views
  • Updated On :

Storage Resource Broker. Case Studies George Kremenek kremenek@sdsc.edu. Digital Sky Project (NPACI) {NVO (NSF)} Hayden Planetarium Simulation & Visualization ASCI - Data Visualization Corridor (DOE) Visual Embryo Project (NLM) Long Term Archiving Project (NARA)

Related searches for Storage Resource Broker

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Storage Resource Broker' - daniel_millan


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Storage resource broker l.jpg

Storage Resource Broker

Case Studies

George Kremenek

kremenek@sdsc.edu


Projects l.jpg

Digital Sky Project (NPACI) {NVO (NSF)}

Hayden Planetarium Simulation & Visualization

ASCI - Data Visualization Corridor (DOE)

Visual Embryo Project (NLM)

Long Term Archiving Project (NARA)

Information Power Grid (NASA)

Particle Physics Data Grid (DOE) {GrPhyN (NSF)}

Biomedical Information Research Network (NIH)

RoadNet (NSF)

Grid Portal (NPACI)

NSDL – National Science Digital Library (NSF)

Knowledge Network for BioComplexity (NSF)

Tera Scale Computing (NSF)

Hyper LTER

Earth System Sciences – CEED, Bionome, SIO Explorer

Education – Transana (NPACI)

Mol Science – JCSG, AfCS

Digital Libraries – ADL, Stanford, UMichigan, UBerkeley, CDL

Projects



Problem l.jpg

Data Transfer and Share TearaBytes of Data Across the Internet

Simulation Data produced at NSCA

Visualized at SDSC

Validated at AMNH, NCSA, UVa & other places

Consumed at AMNH

Data sizes ranged from 3 TB to 10 TB

Other sites (CalTech, BIRN) used as cache resources

Problem


Hayden planetarium project a search for life are we alone l.jpg

The animations was done for the new planetarium show “A Search for Life: Are We Alone?” narrated by Harrison Ford.

The show opened Saturday, March 2nd.

Sites involved in the project :

AMNH = American Museum of Natural History

NCSA = National Center for Supercomputing Applications

SDSC = San Diego Supercomputer Center

University of Virginia

CalTech, NASA, UCSD

Hayden Planetarium Project“A Search for Life: Are We Alone?”


Hayden credits l.jpg

People involved Search for Life: Are We Alone?” narrated by Harrison Ford.

AMNH : Producer, Anthony Braun, Director, Carter Emmart, Erik Wesselak, Clay Budin, Ryan Wyatt, Asst. Curator, Dept of Astrophysics, Mordecai Mac Low

NCSA : Stuart Levy, Bob Patterson

SDSC : David R. Nadeau, Erik Enquist, George Kremenek, Larry Diegel, Eva Hocks

U. Virginia: Professor, John F. Hawley

Hayden Credits


Hayden data in srb l.jpg

Disk accretion: Search for Life: Are We Alone?” narrated by Harrison Ford.

Simulation run at SDSC by John Hawley. Data stored in SRB.

Jet imagery :

Images from Hubble Space Telescope. Data stored in SRB.

Flight path:

Planned at NCSA and AMNH. Data stored in SRB.

Hayden Data in SRB.


Hayden data flow l.jpg
Hayden Data Flow Search for Life: Are We Alone?” narrated by Harrison Ford.

NCSA

SGI

AMNH

NYC

NY

2.5 TB

UniTree

production,parameters, movies, images

data simulation

CalTech

SDSC

GPFS

7.5 TB

IBM SP2

BIRN

HPSS 7.5 TB

UVa

visualization


Hayden data involved l.jpg

ISM = Interstellar Medium Simulation Search for Life: Are We Alone?” narrated by Harrison Ford.

run by Mordecai Mac Low of AMNH at NCSA : 2.5 Terabytes sent from NCSA to SDSC. Data stored in SRB (HPSS, GPFS).

Ionization :

Simulation run at AMNH, 117 Gigabytes sent from AMNH to SDSC. Data stored in SRB.

Star motion:

Simulation run at AMNH by Ryan Wyatt.38 Megabytes sent from AMNH to SDSC.

Hayden, Data involved


Hayden totals l.jpg

Data Search for Life: Are We Alone?” narrated by Harrison Ford.

total 3 * 2.5 TB = 7.5 TB

Files

3 * 9827 files + miscellaneous files

Duration

December 2001, January, February 2002

Hayden totals


Hayden conclusions l.jpg

The SRB was used as a central repository for all original, processed or rendered data.

Location transparency crucial for data storage, data sharing and easy collaborations.

SRB successfully used for a commercial project in “impossible” production deadline situation dictated by marketing department.

Collaboration across sites made feasible with SRB

Hayden Conclusions


Asci doe l.jpg
ASCI - DOE processed or rendered data.


Advanced simulation and computing asci l.jpg

Area processed or rendered data.

Advanced computations, three-dimensional modeling, simulation and visualization.

Problem

evaluating SRB as an advanced data handling platform for the DOE data visualization corridor.

Requirements

SRB working well with HPSS for handling large files as well and large number of small files.

Data movement in “bulk” by researchers

Advanced Simulation and Computing (ASCI)


Ascii and datacutter l.jpg

ASCI is currently evaluating the DataCutter technology in SRB

DataCutter

handles multidimensional data subset-ing and filtering developed by U of Maryland Ohio State.

ASCI is interested in the integration of DataCutter with SRB for the advanced visualization corridor.

ASCII and DataCutter


Asci data flow l.jpg

Data movement across 3 hosts SRB

ASCI Data Flow

applications

SRB server

SRB clients

data cache

local FS

SRB server

MCAT

Oracle

HPSS


Asci people l.jpg

ASCI project - LLNL SRB

Celeste Matarazzo, Punita Sinha

The Storage Resource Broker (SRB): SDSC

Michael Wan, Arcot Rajasekar, Reagan Moore

Datacutter : Univ. Maryland, OSU

Joel Saltz, Tahsin Kurc, Alan Sussman

ASCI People


Slide18 l.jpg

Time-line SRB

1999 - Dec 2002

Data Sizes

Very large files (multi GB)

Large number of small files (over a million files)

Total size exceeding 2 TB for each run

SRB Solution

SRB/HPSS interoperation – highly integrated

SRB data mover protocol adapted to HPSS parallel mover protocol

ASCI


Asci parallel protocol l.jpg

HPSS server SRB

directs the parallel data transfer scheme

uses the class of service HPSS feature

SRB server

is utilizing the HPSS's parallel mover protocol.

transfer rates of up to 40 MB/sec

speedup of 2 to 5 times using multiple threads can be achieved.

ASCI Parallel Protocol


Asci small files l.jpg

Ingesting a very large number of small files into SRB SRB

is time consuming if the files are ingested one at a time

greatly improved with the use of bulk ingestion.

ingestion was broken down into two parts

the registration of files with MCAT

the I/O operations (file I/O and network data transfer)

multi-threading was used for both the registration and I/O operations.

new utility - Sbload was created for this purpose.

reduced the ASCI benchmark time of ingesting ~2,100 files from ~2.5 hours to ~7 seconds.

ASCI Small Files


Asci conclusions l.jpg

Very large number (2 million) of small/average files can be ingested into SRB (HPSS) in short time

Sbload (with bulk SRB registration) can load and register up to 300 files a second

Sbload will be included in next SRB release

Sbload can be used for other resources also

ASCI Conclusions


Digital sky project l.jpg
Digital Sky Project ingested into SRB (HPSS) in short time


Digital sky l.jpg

2MASS (2 Microns All Sky Survey): ingested into SRB (HPSS) in short time

Bruce Berriman, IPAC, Caltech; John Good, IPAC, Caltech, Wen-Piao Lee, IPAC, Caltech

NVO (National Virtual Observatory):

Tom Prince, Caltech, Roy Williams CACR, Caltech, John Good, IPAC, Caltech

SDSC – SRB :

Arcot Rajasekar, Mike Wan, George Kremenek, Reagan Moore

Digital Sky


Digital sky 2mass l.jpg

http://www.ipac.caltech.edu/2mass ingested into SRB (HPSS) in short time

The input data was on tapes in a random order.

Ingestion nearly 1.5 year - almost continuous

SRB performed a spatial sort on data insertion. The disc cache (800 GB) for the HPSS containers was utilized.

Digital Sky - 2MASS


Digital sky data ingestion l.jpg
Digital Sky Data Ingestion ingested into SRB (HPSS) in short time

Data Cache

SRB

SUN E10K

star catalog

Informix

SUN

HPSS

800 GB

….

input tapes from telescopes

10 TB

SDSC

IPAC CALTECH


Digital sky data ingestion28 l.jpg

4 parallel streams (4 MB/sec per stream), 24*7*365 ingested into SRB (HPSS) in short time

Total 10+TB, 5 million, 2 MB images in 147,000 containers.

Ingestion speed limited by input tape reads

Only two tapes per day can be read

work flow incorporated persistent features to deal with network outages and other failures.

C API was utilized for fine grain control and to be able to manipulate and insert metadata into Informix catalog at IPAC Caltech.

Digital Sky Data Ingestion


Data sorting l.jpg

Sorting of 5 million files on the fly ingested into SRB (HPSS) in short time

Input tape files: temporal order

Stored SRB Containers: spatial order

Scientists view/analyze data by neighborhood

Data Flow:

Files from tape streamed to SRB

SRB puts them in proper ‘bins’ (containers)

Container cache-management a big problem

Files from a tape may go into more than 1000 bins

Cache space limitations (300-800GB) made for a lot of trashing

SRB Daemon managed cache - watermarks

Data Sorting


Digital sky data retrieval l.jpg

average 3000 images a day ingested into SRB (HPSS) in short time

Digital Sky Data Retrieval

Informix

SRB

SUN E10K

WEB

SUNs

IPAC CALTECH

HPSS

800 GB

WEB

SUNs

SGIs

….

10 TB

JPL

SDSC


Digital sky apps eg sky mosaic classification l.jpg

Processing 10 TB on thousands of nodes ingested into SRB (HPSS) in short time

Digital Sky Apps(Eg.: sky mosaic, classification, …)

SRB

SUN E15K

IBM SP2 (DTF)

HPSS

….

SAN disks, shared 10+TB

10 TB

SDSC


Digsky conclusion l.jpg

SRB can handle large number of files ingested into SRB (HPSS) in short time

Metadata access is still less than ½ sec delay

Replication of large collections

Single command for geographical replication

On-the-fly sorting (out-of-tape sorting)

Availability of data otherwise not possible

Near-line access to 5 million files (10 TB)

Successfully used in web-access & large scale analysis (daily)

DigSky Conclusion


Slide33 l.jpg

Thank you for your attention. ingested into SRB (HPSS) in short time

Any questions?

http://www.npaci.edu/dice/srb


ad