slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl PowerPoint Presentation
Download Presentation
Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

Loading in 2 Seconds...

play fullscreen
1 / 26

Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl - PowerPoint PPT Presentation


  • 110 Views
  • Uploaded on

Global Data Grids The Need for Infrastructure. Paul Avery University of Florida http://www.phys.ufl.edu/~avery/ avery@phys.ufl.edu. Extending the Grid Reach in Europe Brussels, Mar. 23, 2001 http://www.phys.ufl.edu/~avery/griphyn/talks/avery_brussels_23mar01.ppt. Global Data Grid Challenge.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl' - cassady-beasley


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1
Global Data Grids

The Need for Infrastructure

Paul Avery

University of Florida

http://www.phys.ufl.edu/~avery/

avery@phys.ufl.edu

Extending the Grid Reach in Europe

Brussels, Mar. 23, 2001http://www.phys.ufl.edu/~avery/griphyn/talks/avery_brussels_23mar01.ppt

Paul Avery

global data grid challenge
Global Data Grid Challenge

“Global scientific communities, served by networks with bandwidths varying by orders of magnitude, need to perform computationally demanding analyses of geographically distributed datasets that will grow by at least 3 orders of magnitude over the next decade, from the 100 Terabyte to the 100 Petabyte scale.”

Paul Avery

data intensive science 2000 2015
Data Intensive Science: 2000-2015
  • Scientific discovery increasingly driven by IT
    • Computationally intensive analyses
    • Massive data collections
    • Rapid access to large subsets
    • Data distributed across networks of varying capability
  • Dominant factor: data growth (1 Petabyte = 1000 TB)
    • 2000 ~0.5 Petabyte
    • 2005 ~10 Petabytes
    • 2010 ~100 Petabytes
    • 2015 ~1000 Petabytes?

How to collect, manage,

access and interpret this

quantity of data?

Paul Avery

data intensive disciplines
Data Intensive Disciplines
  • High energy & nuclear physics
  • Gravity wave searches (e.g., LIGO, GEO, VIRGO)
  • Astronomical sky surveys (e.g., Sloan Sky Survey)
  • Global “Virtual” Observatory
  • Earth Observing System
  • Climate modeling
  • Geophysics

Paul Avery

data intensive biology and medicine
Data Intensive Biology and Medicine
  • Radiology data
  • X-ray sources (APS crystallography data)
  • Molecular genomics (e.g., Human Genome)
  • Proteomics (protein structure, activities, …)
  • Simulations of biological molecules in situ
  • Human Brain Project
  • Global Virtual Population Laboratory (disease outbreaks)
  • Telemedicine
  • Etc.

Commercial applications not far behind

Paul Avery

the large hadron collider at cern
The Large Hadron Collider at CERN

“Compact” Muon Solenoid

at the LHC

Standard man

Paul Avery

lhc computing challenges
LHC Computing Challenges
  • Complexity of LHC environment and resulting data
  • Scale: Petabytes of data per year (100 PB by 2010)
  • Global distribution of people and resources

1800 Physicists

150 Institutes

32 Countries

CMS Experiment

Paul Avery

global lhc data grid hierarchy

Tier 0 (CERN)

3

3

3

3

T2

T2

3

T2

Tier 1

3

3

T2

T2

3

3

3

3

3

3

4

4

4

4

Global LHC Data Grid Hierarchy

Tier0 CERNTier1 National LabTier2 Regional Center at UniversityTier3 University workgroupTier4 Workstation

GriPhyN:

  • R&D
  • Tier2 centers
  • Unify all IT resources

Paul Avery

global lhc data grid hierarchy1

Tier2 Center

Tier2 Center

Tier2 Center

Tier2 Center

Tier2 Center

HPSS

HPSS

HPSS

HPSS

Global LHC Data Grid Hierarchy

Experiment

~PBytes/sec

Online System

~100 MBytes/sec

Bunch crossing per 25 nsecs.100 triggers per secondEvent is ~1 MByte in size

CERN Computer Center > 20 TIPS

Tier 0 +1

HPSS

2.5-10 Gb/sec

USA Center

Italy Center

UK Center

France Center

Tier 1

2.5-10 Gb/sec

Tier 2

~622 Mbits/sec

Tier 3

Institute ~0.25TIPS

Institute

Institute

Institute

100 - 1000 Mbits/sec

Physics data cache

Physicists work on analysis “channels”.

Each institute has ~10 physicists working on one or more channels

Tier 4

Workstations,other portals

Paul Avery

global virtual observatory

Image Data

Standards

Source Catalogs

Specialized Data:

Spectroscopy, Time Series,

Polarization

Information Archives:

Derived & legacy data: NED,Simbad,ADS, etc

Discovery Tools:

Visualization, Statistics

Global Virtual Observatory

Multi-wavelength astronomy,Multiple surveys

Paul Avery

gvo the new astronomy
GVO: The New Astronomy
  • Large, globally distributed database engines
    • Integrated catalog and image databases
    • Multi-Petabyte data size
    • Gbyte/s aggregate I/O speed per site
  • High speed (>10 Gbits/s) backbones
    • Cross-connecting, correlating the major archives
  • Scalable computing environment
    • 100s–1000s of CPUs for statistical analysis and discovery

Paul Avery

slide12

Infrastructure

for

Global Grids

Paul Avery

grid infrastructure
Grid Infrastructure
  • Grid computing sometimes compared to electric grid
    • You plug in to get resource (CPU, storage, …)
    • You don’t care where resource is located
  • This analogy might have an unfortunate downside
  • You might need different sockets!

Paul Avery

role of grid infrastructure
Role of Grid Infrastructure
  • Provide essential common Grid infrastructure
    • Cannot afford to develop separate infrastructures
  • Meet needs of high-end scientific collaborations
    • Already international and even global in scope
    • Need to share heterogeneous resources among members
    • Experiments drive future requirements
  • Be broadly applicable outside science
    • Government agencies: National, regional (EU), UN
    • Non-governmental organizations (NGOs)
    • Corporations, business networks (e.g., supplier networks)
    • Other “virtual organizations”
  • Be scalable to the Global level
    • But EU + US is a good starting point

Paul Avery

a path to common grid infrastructure
A Path to Common Grid Infrastructure
  • Make a concrete plan
  • Have clear focus on infrastructure and standards
  • Be driven by high-performance applications
  • Leverage resources & act coherently
  • Build large-scale Grid testbeds
  • Collaborate with industry

Paul Avery

building infrastructure from data grids
Building Infrastructure from Data Grids
  • 3 Data Grid projects recently funded
  • Particle Physics Data Grid (US, DOE)
    • Data Grid applications for HENP
    • Funded 2000, 2001
    • http://www.ppdg.net/
  • GriPhyN (US, NSF)
    • Petascale Virtual-Data Grids
    • Funded 9/2000 – 9/2005
    • http://www.griphyn.org/
  • European Data Grid (EU)
    • Data Grid technologies, EU deployment
    • Funded 1/2001 – 1/2004
    • http://www.eu-datagrid.org/
  • HEP in common
  • Focus: infrastructure development & deployment
  • International scope

Paul Avery

background on data grid projects
Background on Data Grid Projects
  • They support several disciplines
    • GriPhyN: CS, HEP (LHC), gravity waves, digital astronomy
    • PPDG: CS, HEP (LHC + current expts), Nuc. Phys., networking
    • DataGrid: CS, HEP, earth sensing, biology, networking
  • They are already joint projects
    • Each serving needs of multiple constituencies
    • Each driven by high-performance scientific applications
    • Each has international components
    • Their management structures are interconnected
  • Each project developing and deploying infrastructure
    • US$23M (additional proposals for US$35M)

What if they join forces?

Paul Avery

a common infrastructure opportunity
A Common Infrastructure Opportunity
  • GriPhyN + PPDG + EU-DataGrid + national efforts
    • France, Italy, UK, Japan
  • Have agreed to collaborate, develop joint infrastructure
    • Initial meeting March 4 in Amsterdam to discuss issues
    • Future meetings in June, July
  • Preparing management document
    • Joint management, technical boards + steering committee
    • Coordination of people, resources
    • An expectation that this will lead to real work
  • Collaborative projects
    • Grid middleware
    • Integration into applications
    • Grid testbed: iVDGL
    • Network testbed (Foster): T3 = Transatlantic Terabit Testbed

Paul Avery

ivdgl
iVDGL
  • International Virtual-Data Grid Laboratory
    • A place to conduct Data Grid tests at scale
    • A concrete manifestation of world-wide grid activity
    • A continuing activity that will drive Grid awareness
    • A basis for further funding
  • Scale of effort
    • For national, international scale Data Grid tests, operations
    • Computationally and data intensive computing
    • Fast networks
  • Who
    • Initially US-UK-EU
    • Other world regions later
    • Discussions w/ Russia, Japan, China, Pakistan, India, South America

Paul Avery

ivdgl parameters
iVDGL Parameters
  • Local control of resources vitally important
    • Experiments, politics demand it
    • US, UK, France, Italy, Japan, ...
  • Grid Exercises
    • Must serve clear purposes
    • Will require configuration changes  not trivial
    • “Easy”, intra-experiment tests first (10-20%, national, transatlantic)
    • “Harder” wide-scale tests later (50-100% of all resources)
  • Strong interest from other disciplines
    • Our CS colleagues (wide scale tests)
    • Other HEP + NP experiments
    • Virtual Observatory (VO) community in Europe/US
    • Gravity wave community in Europe/US/(Japan?)
    • Bioinformatics

Paul Avery

revisiting the infrastructure path
Revisiting the Infrastructure Path
  • Make a concrete plan
    • GriPhyN + PPDG + EU DataGrid + national projects
  • Have clear focus on infrastructure and standards
    • Already agreed
    • COGS (Consortium for Open Grid Software) to drive standards?
  • Be driven by high-performance applications
    • Applications are manifestly high-perf: LHC, GVO, LIGO/GEO/Virgo, …
    • Identify challenges today to create tomorrow’s Grids

Paul Avery

revisiting the infrastructure path cont
Revisiting the Infrastructure Path (cont)
  • Leverage resources & act coherently
    • Well-funded experiments depend on Data Grid infrastructure
    • Collab. with national laboratories: FNAL, BNL, RAL, Lyon, KEK, …
    • Collab. with other Data Grid projects: US, UK, France, Italy, Japan
    • Leverage new resources: DTF, CAL-IT2, …
    • Work through Global Grid Forum
  • Build and maintain large-scale Grid testbeds
    • iVDGL
    • T3
  • Collaboration with industry  next slide
  • EC investment in this opportunity
    • Leverage and extend existing projects, worldwide expertise
    • Invest in testbeds
    • Work with national projects (US/NSF, UK/PPARC, …)

Part of same infrastructure

Paul Avery

collaboration with industry
Collaboration with Industry
  • Industry efforts are similar, but only in spirit
    • ASP, P2P, home PCs, …
    • IT industry mostly has not invested in Grid R&D
    • We have different motives, objectives, timescales
  • Still many areas of common interest
    • Clusters, storage, I/O
    • Low cost cluster management
    • High-speed, distributed databases
    • Local and wide-area networks, end-to-end performance
    • Resource sharing, fault-tolerance, …
  • Fruitful collaboration requires clear objectives
  • EC could play important role in enabling collaborations

Paul Avery

status of data grid projects
Status of Data Grid Projects
  • GriPhyN
    • US$12M funded by NSF/ITR 2000 program (5 year R&D)
    • 2001 supplemental funds requested for initial deployments
    • Submitting 5-year proposal ($15M) to NSF
    • Intend to fully develop production Data Grids
  • Particle Physics Data Grid
    • Funded in 1999, 2000 by DOE ($1.2 M per year)
    • Submitting 3-year Proposal ($12M) to DOE Office of Science
  • EU DataGrid
    • 10M Euros funded by EU (3 years, 2001 – 2004)
    • Submitting proposal in April for additional funds
  • Other projects?

Paul Avery

grid references
Grid References
  • Grid Book
    • www.mkp.com/grids
  • Globus
    • www.globus.org
  • Global Grid Forum
    • www.gridforum.org
  • PPDG
    • www.ppdg.net
  • EU DataGrid
    • www.eu-datagrid.org/
  • GriPhyN
    • www.griphyn.org

Paul Avery

summary
Summary
  • Grids will qualitatively and quantitatively change the nature of collaborations and approaches to computing
  • Global Data Grids provide challenges needed to build tomorrows Grids
  • We have a major opportunity to create common infrastructure
  • Many challenges during the coming transition
    • New grid projects will provide rich experience and lessons
    • Difficult to predict situation even 3-5 years ahead

Paul Avery