1 / 11

CHEP 2004 April 28

Data Management in EGEE gLite Krzysztof Nienartowicz on behalf of the JRA1 DM Cluster: Paolo Badino, Ricardo Brito da Rocha, Akos Frohner, Peter Kunszt, Gavin McCance, Krzysztof Nienartowicz. CHEP 2004 April 28. Contents. Guiding Principles Overview of Services and concepts I/O Catalogs

Download Presentation

CHEP 2004 April 28

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Management in EGEE gLiteKrzysztof Nienartowiczon behalf of the JRA1 DM Cluster:Paolo Badino, Ricardo Brito da Rocha, Akos Frohner, Peter Kunszt, Gavin McCance, Krzysztof Nienartowicz CHEP 2004 April 28

  2. Contents • Guiding Principles • Overview of Services and concepts • I/O • Catalogs • File movement • Databases

  3. Guiding Principles Service Oriented Architecture Interoperability Portability Building on existingcomponents in alightweight manner Web Services Modularity AliEn LCG Condor Scalability Globus SRM ...

  4. Storage Element SRM interface Posix I/O interface Supports some protocols (bbftp, https, ftp, gsiftp, rfio, dcap, aiod, …) Site transfer queueing service Manages the transfers to a site. This is equivalent to the batch queues on some local farms, this service actually manages a resource: the network. Policies concerning network usage can be specified here (i.e. max bandwidth to be used by certain organisations) VO transfer queueing service Fetch scheduled transfers targeting this site from the VO scheduler and put them into the site transfer queue. Enforce VO policies concerning the local storage VO Data Scheduler This is the top-level scheduler for data transfers. There may be many such schedulers. Data Placement Optimizers Based on the list of planned transfers optimize the source, the network, check target space, resolve logical names, etc. Data Placement Policy Enforcers Modify the list of the scheduler based on various policies, like exclusion of certain targets Event-based schedulers Put entries in the scheduler based on some triggering event (time, monitoring events) Catalogs File catalog, Replica catalog File Authorization Metadata Distribution of catalogs, conflicts resolution I/O library Product Overview

  5. How to access data: I/O • Posix and a posix-like (richer) I/O library • Use with Logical File Names or GUIDs • Authentication and authorization based on Grid credentials (VOMS proxy) • Extended permissions GUID/LFN access logic embedded in catalogs or exported to be used by external ones. • Implementation based on Alien I/O and GFAL Internals File Catalog Replica Catalog File Authorization GLite I/O server I/O lib SRM Cache

  6. Replica Catalog Site B Replica Catalog Site A LFN LFN GUID GUID SURL SURL SURL SURL Catalogs • Replica Catalog • Keeps information at a site • Based on RLS • (Meta Data Catalog) • Attributes of files on the logical level • Boundary between generic middleware and application layer • File Catalog • Filesystem-like view on logical file names • Keeps track of sites where data is stored • Conflict resolution • Based on Alien FC Metadata Catalog Metadata File Catalog GUID Site ID LFN Site ID

  7. File Movement and Management • Data scheduling and high-level optimization • Job-like data transfers (queuing, ordering, etc) • Possibility to use reliable managed file transfer • Site self-consistency (locality of reference) • SRM-based managed storage (permanent and volatile) • Implementation based on Stork and TMDB (now PHEDEX)

  8. File Movement and Management • Internals

  9. Databases, metadata access We expect that the communities want to use EXISTING databases through the grid We provide two possibilities of database integration: • Metadata catalog interface (to provide a relation between files and database queries) • Grid database interface (collaboration with OGSA-DAI project) Also looking into providing user-space metadata management (user-defined metadata) and metadata extensions to the file catalog. We provide ready-to-use access control service which could be used by wrappers of existing services Working with the LCGDDD project to provide distributed DBs.

  10. Current Status • Fully integrated in the gLite prototype: • File Catalog (original AliEn) • Metadata Catalog • Glite I/O • Being integrated, first implementation ready • Replica Catalog • File Access Service • Transfer Service • Being completed • Data Scheduler • Distributed catalogs

  11. Summary • Basic services from user’s point of view: • File I/O • Data Scheduling • Cataloging services • Requires a stack of services to achieve its functionality • The modules in this stack are customizable or replaceable • Currently these services are being rolled into the gLite prototype testbed.

More Related