athena the grid architectural view
Download
Skip this Video
Download Presentation
Athena & the Grid Architectural View

Loading in 2 Seconds...

play fullscreen
1 / 29

Athena & the Grid Architectural View - PowerPoint PPT Presentation


  • 81 Views
  • Uploaded on

Athena & the Grid Architectural View. Craig E. Tull HCG/NERSC/LBNL ATLAS/LHCb/GridPP Workshop Cosener\'s House - May 23, 2002. What this talk is:. What this talk is not: Another presentation of GRAPPA. See Rob\'s talk of yesterday. What this talk is:

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Athena & the Grid Architectural View' - nedra


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
athena the grid architectural view

Athena & the GridArchitectural View

Craig E. Tull

HCG/NERSC/LBNL

ATLAS/LHCb/GridPP Workshop

Cosener\'s House - May 23, 2002

what this talk is
What this talk is:
  • What this talk is not:
    • Another presentation of GRAPPA.
    • See Rob\'s talk of yesterday.
  • What this talk is:
    • An ATLAS perspective on the view of the Grid from the Athena/Gaudi Framework.
    • A seat of the pants distillation of some impressions from this workshop\'s presentations.
    • Food for thought and discussions in this afternoon\'s session.
    • … and slightly Random.
athena gaudi architecture

Converter

Converter

Application

Manager

Converter

Transient Event Store

Data

Files

Message

Service

Persistency

Service

Event Data

Service

JobOptions

Service

Algorithm

Algorithm

Algorithm

Data

Files

Transient Detector

Store

Particle Prop.

Service

Persistency

Service

Detec. Data

Service

Other

Services

Data

Files

Transient

Histogram Store

Persistency

Service

Histogram

Service

Athena/GAUDI Architecture
bigger picture

Converter

Converter

Application

Manager

Converter

Event

Selector

Transient Event Store

Message

Service

Persistency

Service

Event Data

Service

JobOptions

Service

Algorithm

Algorithm

Algorithm

Transient Detector

Store

Particle Prop.

Service

Persistency

Service

Detec. Data

Service

Other

Services

Transient

Histogram Store

Persistency

Service

Histogram

Service

Bigger Picture

DataSet

DB

OS

Job

Service

Mass

Storage

Monitoring

Service

Config.

Service

Event

Database

PDG

Database

Analysis Program

Other

Other

Histo

Presenter

grid the new paradigm
Grid: The new paradigm

?

?

  • The Grid offers a vision of computer resources that are: Distributed, Heterogeneous, Robust, and Integrated.
  • Some concepts are qualitatively new.
    • Resource Discovery, Virtual Data, Reserved QoS
  • Some concepts are quantitatively "new".
    • Number of sites/jobs/nodes/users.
  • Some concepts are old wine in new skins.
    • Distributed processing
  • Some are natural & "obvious" extensions of old concepts.
    • Unix GroupsVO, LFNs
grid projects integrated
Grid Projects: Integrated?
  • We\'ve heard here about:
    • GANGA, GRAPPA, BOSS, AliEn
    • CMT, Pacman, Packman, DAR
    • WP1 JSS, GriPhyN Planner
    • Magda, WP2 Replica Service
    • NetLogger, Prophesy, GMA, R-GMA, GridView, Ganglia
    • VDL/IVDL, WP1 JDL, Condor ClassAds
    • EDG, PPDG, GriPhyN, GridPP, InfoGrid, CrossGrid, GGF, Monarch,…
  • How do we take advantage of Grid capability while protecting ourselves from potential duplication/conflicts of roles & responsibility?
grid ready for primetime
Grid: Ready for PrimeTime?
  • CHEP\'98 -First HENP Grid (Clipper) Talk
    • #237 Directions and Issues for High Data Rate Wide Area Network Environments
  • Many Grid projects are CS R&D. But production grids do exist (eg. NASA InfoGrid) and indications are that Grid computing is gaining momentum in non-HENP (ie. mainstream) world.
  • IBM/Globus Partnership - 12 developers
atlas sw grid projects
ATLAS SW & Grid Projects
  • The Grid does now offer advantages & functionality. More will certainly come.
  • We cannot afford to wait to be handed the solution.
  • APIs to Grid services need to be compatible or adapted with Athena Services
  • ATLAS interests/requirements need to be communicated to Grid researchers/developers & DOE/NFS.
  • Timelines for ATLAS need to be defined.
    • Grid timeline is not the same as some others
    • FTE resources avail. are critical input
  • Much current work concentrates on issues like:
    • Data Volume, Data Set Distribution, ATLAS Resources (Disk, CPU, HMS), Network Connectivity, $$$, FTE, etc.
  • Distributed Computing Model must be defined.
  • Control Framework
    • Grid-compatible / Grid-aware, but not Grid-dependent
grid aware but not dependent
Grid aware, but not dependent.
  • Interface Technologies
    • Programmatic API (eg. C, C++, etc)
    • Scripting as Glue ala Stallman (eg. Python)
    • JobOptions.{txt,py}
    • Sandbox
    • Others?
      • eg. SOAP, CORBA, RMI, DCOM, .NET, etc.
  • International Standards would help!
    • Global Grid Forum
  • Staged approach is called for.
    • Simple Batch model to begin. Add simple Grid functionality via Services. Continual feedback.
athena grid interface
Athena/Grid Interface
  • For the programmatic interface to Grid services, we are thinking in terms of Gaudi services to capture and present the functionality of the grid services (not necessarily a one-to-one mapping, BTW).
  • I think it is important at this stage (maybe forever) to insure that the framework is "grid-capable" without being "grid-dependent". IE- We should always be able to run without grid services available.
    • Gaudi\'s component architecture makes this approach to using the grid quite natural.
    • How do we switch between Grid/non-Grid?
jul 01 pseudocode for atlas short term uc01
Jul’01: PSEUDOCODE FOR ATLAS SHORT TERM UC01

Logical File Name

LFN = "lfn://"hostname"/"any_string

Physical File Name

PFN = "pfn://"hostname"/"path

Transfer File Name

TFN = "gridftp://"PFN_hostname"/path

JDL

InputData = {LFN[]}

OutputSE = host.domain.name

Worker Node

LFN[] = WP1.LFNList()

for (i=0;i<LFN.list;i++){

PFN[] = ReplicaCatalog.getPhysicalFileNames(LFN[i])

j = Athena.eventSelectonSrv.determineClosestPF(PFN[])

localFile = GDMP.makeLocal(PFN[j],OutputSE)

Athena.eventSelectionSrv.open(localFile)

}

PFN[] = getPhysicalFileNames(LFN):

PFN = getBestPhysicalFileName(PFN[], String[] protocols)

TFN = getTransportFileName(PFN, String protocol)

filename = getPosixFileName(TFN)

wp2 replica manager api old pre sfn terminology
WP2: Replica Manager API(old: pre-SFN terminology)
  • addPhysicalFileName(LogicalFileName, PhysicalFileName)
  • deletePhysicalFileName(LogicalFileName, PhysicalFileName)
  • SFN = getPhysicalFileNames(LogicalFileName)
  • copy(PhysicalFileName source, PhysicalFileName destination, String protocol)
  • copyAndAddPhysicalFile(PhysicalFileName source, PhysicalFileName destination, LogicalFileName lfn, String protocol)
  • generatePhysicalFileName(LogicalFileName filename, PhysicalFileNamePattern)
  • estimateCostForCopy(PhysicalFileName source, PhysicalFileName destination, String protocol)
  • SFN = getLocationOfBestReplica (LogicalFileName)
  • getBestPhysicalFileName (PhysicalFileNameList, ProtocolList)
  • getTransportFileName (PhysicalFileName, Protocol)
athena distributed instrumentation
Athena Distributed Instrumentation
  • Part of SuperComputing 2002 ATLAS demo
  • IMonitorSvc  IChronoStatSvc extension?
    • Abstract application monitoring service.
  • Prophesy (http://prophesy.mcs.anl.gov/)
    • An Infrastructure for Analyzing & Modeling the Performance of Parallel & Distributed Applications
    • Normally a Parse & auto-instrument approach (C & FORTRAN).
  • NetLogger (http://www-didc.lbl.gov/NetLogger/)
    • End-to-End Monitoring & Analysisof Distributed Systems
    • C, C++, Java, Python, Perl, Tcl APIs
    • Web Service Activation
wp1 sandbox
WP1: Sandbox
  • Working area (input & output) replicated on each CE to which Grid job is submitted.
    • Very convenient & natural.
  • My Concerns:
    • Requires network access (with associated privileges) to all CEs on Grid.
      • Could be a huge security issue with local administrators.
    • Not (yet) coordinated with WP2 services.
    • Sandbox contents not customizable to local (CE/SE/PFN) environment.
    • Temptation to Abuse (not for data files)
grid system
Grid System

Logical filenames

ATLAS

planner

WP2

Rep Mgr

WP1

JSS

Planner

Job

JDL

Specify input

Sandbox

Physical

File

JobOptions

GDB

Output fragment

GDB

input

Register

output

GDB

Magda

atlas sw the grid
ATLAS SW & the Grid
  • What are the implications of a distributed computing model and grids for:
  • The database domain?
    • Extensive in almost any case
  • The control framework?
    • Depends upon the model (e.g., distributed data sources versus distributing executables versus distributed execution)
  • Other ATLAS software infrastructure?
    • eg. Build & install tools & kits
distributed processing models
Distributed Processing Models
  • Batch-like Processing (ala WP1)
  • Distributed Single Event (MPP)
  • Client-Server (interactive)
  • WAN Data Access (AMS, Clipper)
  • File Transfer and Local Processing (GDMP)
  • Agent-based Processing (distributed control)
  • Check-Point & Migrate (save & restore)
  • Scatter & Gather (parallel events)
  • Move the data or move the executable?
    • No experiment is planning to write PetaBytes of Code!
atlas distributed processing model
ATLAS Distributed Processing Model
  • At this point, it is still not clear what the final ATLAS distributed computing model will be. Although newer ideas like Agent-based Processing have a great deal of appeal, they are as yet unproven in a large-scale production environment.
  • A conservative approach would be some combination of Batch-like Processing and File Transfer and Local Processing for batch jobs, with perhaps a Client-Server or Scatter-Gather approach for interactive/analysis jobs.
    • PPDG CS-11 - Interfacing and Integrating Interactive Data Analysis Tools with the Grid and Identifying Common Components and Services
data access patterns
Data Access Patterns
  • Data access patterns of physics jobs also heavily influence our thinking about interacting with the Grid. It is likely that all possible data access patterns will be extant in ATLAS data processing at various stages in that processing.We may find that some data access patterns lend themselves to efficient use of the Grid much better than others.
  • Data access patterns include:
    • Sequential Access (reconstruction)
    • Random Access (interactive analysis)
    • File/Data Set Driven (LFN-friendly)
    • Navigational Driven (OODB-like)
    • Query Driven (SQL/OQL/JDO/etc)
db architectural elements
DB Architectural Elements
  • Events are write-once
  • Three capabilities to support optimization:
      • Event sharing
      • Data sharing
      • Data placement (clustering)
  • Therefore, different storage formats
    • Does not mean different technologies!
    • Different ways to represent events and sets of events.
    • Possible because navigation is separated from storage.
    • Examples…

ATLAS DataBase Architecture - Ed Frank

architectural motif extract transform
Architectural Motif- Extract & Transform
  • Architecture will express many storage formats
    • Any job can read any of them without reconfiguration
  • Can always extract events for transport, regardless of format
    • Cost depends upon the storage format
  • Tier 0 assigned responsibility of keeping a copy of the data in a format such that extraction costs are affordable
    • Archival data format
  • Can always transform (write) data into a new format
    • Store in format for local optimization
extract and transform
Extract and Transform

Site 1

Extract & transform

Just Extract

Transport, transform & Install

Transport & Install

Site 2

Site 3

ATLAS DataBase Architecture - Ed Frank

object access vs file access
Object Access vs File Access
  • ATLAS (like others) is basing our Event Data Model (EDM) on a (transient) Object Data Model.
    • This transient model maps onto a persistent Object Model (not necessarily 1-to-1)
  • We require users to think of objects in the transient store at the Algorithm level.
    • Transient Data Store has data access proxy concepts built in to read-in objects from persistency to TDS.
  • Current Grid products heavily oriented towards LFN-like view of data.
    • Perfectly natural as this is the system-level view of data & convenient unit for atomic data transfer across the network. (eg. FTP, gridFTP)
  • BUT, if we want users to think objects, the object to LFN/PFN mapping has to be somewhere.
ganga senarios
Ganga Senarios
  • Scenario 1
    • User makes a "high-level" selection of data to process and defines processing job.
      • "High-level" means based on event characteristics and not on file or even identity.
    • High-level event selection uses ATLAS Bookkeeping DataBase (similar to current LArC Bookkeeping data base or BNL\'s Magda) to select event & logical file identities.
    • Construct JDL for WP1 using LFNs
    • Construct jobOptions.py using PFNs (w/ WP2)
    • Submit job(s) using JDL & jobOptions.py in sandbox.
  • Scenario 2 - The same except jobOptions.py now contains LFNs. This requires the Replica Service API-enabled EvtSelector or ConversionSrv.
observation about guis
Observation about GUIs
  • Several projects are promoting GUIs.
    • WP1, Grappa, AliEn, others.
  • Independently written "native" GUIs are notoriously difficult to integrate/make coherent.
  • Web-based GUIs are easier to integrate, but offer limited functionality.
rule 1 protect the user
Rule #1: Protect the User
  • Real Data vs. Virtual Data
  • LFN vs. PFN/TFN/SFN
  • Grid Enabled vs. Standalone
  • We do not want the user of the Framework to know or care about details like this.
    • Implies: Uniform, abstract access to/specification of data sets (ie. if Real and Virtual Data are to be used).
    • Dummy (non-Grid) implementations of Grid-enabled Services?
way forward discussion
Way Forward/Discussion
  • Goal: Give direction to new hires funded by GridPP to ensure that their work has the widest applicability in both ATLAS & LHCb.
  • Discussion Questions:
    • Data-File or Data-Object level access?
    • Heterogeneity - How much? (Client vs. Server)
    • Communication Protocols?
    • How to synchronize/coordinate?
      • ATLAS world-wide & Large Active US effort
      • LHCb - no US component => more EDG-centric
    • GAUDI/Athena - Where to draw the line?
      • Grid middleware/Svc Interfaces/Implementations
    • Balance Short-term Usability vs. Long-term Functionality - Remember the mainstream.
ad