egee data management peter kunszt n.
Skip this Video
Loading SlideShow in 5 Seconds..
EGEE Data Management Peter Kunszt PowerPoint Presentation
Download Presentation
EGEE Data Management Peter Kunszt

EGEE Data Management Peter Kunszt

111 Views Download Presentation
Download Presentation

EGEE Data Management Peter Kunszt

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Diligent – EGEE JRA1 Meeting, 2004 December 16 EGEE Data ManagementPeter Kunszt

  2. Contents • Component Overview • gLite Catalogs • Overview • Concepts • Implementations • Distribution • gLite Transfer Management • Scheduling model • Implementation • Deployment models • Distribution mechanisms • Discussion Dec. 16 Diligent – JRA1 Workshop Peter Kunszt2

  3. Guiding Principles Service Oriented Architecture Interoperability Portability Building on existingcomponents in alightweight manner Web Services Modularity AliEn LCG Condor Scalability Globus SRM ... Dec. 16 Diligent – JRA1 Workshop Peter Kunszt3

  4. Data Management Tasks • File Management • Storage • Access • Placement • Cataloguing • Security • Metadata Management • Secure database access • Schema management • File-based metadata • Generic metadata Dec. 16 Diligent – JRA1 Workshop Peter Kunszt4

  5. Product Overview • File Storage • Storage Elements with SRM (Storage Resource Manager) interface • Posix I/O interface through glite-io • Supports transfer protocols (bbftp, https, ftp, gsiftp, rfio, dcap, …) • Catalogs • File and Replica Catalog • File Authorization Service • Metadata Catalog • Distribution of catalogs, conflicts resolution (messaging) • Transfer • Top-level Data Scheduler as global entry point (there may be many). • Site File Placement Service managing transfers and catalog interactions • Site File Transfer Service managing incoming transfers (the network resource) Dec. 16 Diligent – JRA1 Workshop Peter Kunszt5

  6. File Movement and Management • Data scheduling and high-level optimization • Job-like data transfers (queuing, ordering, etc) • Possibility to use reliable managed file transfer • Site self-consistency (locality of reference) • SRM-based managed storage (permanent and volatile) Dec. 16 Diligent – JRA1 Workshop Peter Kunszt6

  7. File Movement and Management • Internals Dec. 16 Diligent – JRA1 Workshop Peter Kunszt7

  8. Metadata Catalog Contents Storage URL SymLink LogicalFile Name Global UniqueIDentifyer Storage URL SymLink Storage URL Unique User-defined Mutable Unique System-defined Immutable UUID Dec. 16 Diligent – JRA1 Workshop Peter Kunszt8

  9. Concepts • Directories • Symlinks • Authorization: ACL and base (unix) permissions • File metadata (size, ctime, mtime, checksum, status, type) • File-based metadata (key-value pairs on files), the schema is associated per directory • Extensible metadata including schema manipulation • Maybe virtual directories (cached metadata queries) in the future Dec. 16 Diligent – JRA1 Workshop Peter Kunszt9

  10. End-userInterface Interface Design FAS FiReMan MetadataCatalog FileCatalog ReplicaCatalog MetadataSchema FASBase MetadataBase ServiceBase SEIndex Base Interfaces Service Interfaces Feature Interfaces Dec. 16 Diligent – JRA1 Workshop Peter Kunszt10

  11. Metadata Capabilities • Metadata directly in the File Catalog • Like POSIX file metadata: key-value pairs stored. • Metadata Schema (description of key-value pairs) may be different for each directory, but all files in the same directory share the same keys • Limited query and search capabilities to single directory or single schema: the hierarchy has to restrict the query (we don’t allow a global find-like operation on metadata) • Unconstrained Metadata • Any schema possible • Schema manipulation interface available • Generic query interface (just pass in a query string) • Application-specific Metadata • On top of any of these two gLite specifications, applications can build their own metadata interface Dec. 16 Diligent – JRA1 Workshop Peter Kunszt11

  12. gLite Catalog Implementations • Fireman Interface • Oracle 9i implementation • MySQL implementation • MetadataCatalog Interface • MySQL implementation • Oracle 9i implementation • MetadataSchema Interface • MySQL implementation • Oracle 9i implementation • Apply interfaces to existing implementations • Will have a Fireman interface also over the AliEn FC • Fireman interface over the LCG FC • MetadataCatalog and MetadataSchema over existing application catalogs • … DONE In progress or planning Dec. 16 Diligent – JRA1 Workshop Peter Kunszt12

  13. Catalog Deployment Models • Single central catalog (AliEn, LCG-2 model) • All operations go there • Local catalogs with a central component • Update operation only on local catalogs • Update operation on both local and central catalogs • Local catalogs, no central component – only indices for certain queries Dec. 16 Diligent – JRA1 Workshop Peter Kunszt13

  14. Distribution Mechanism 1 • Data Scheduler (global and local schedulers) • Global scheduler (VO-specific) takes requests like • Copy set of files from A to B • Make set of files available at C • Upload files from GSIFTP server to D • Delete files • Maybe also metadata operations • Local scheduler fetches tasks from known global schedulers • Coupled tightly to a local transfer service • Manage transfer where the local site is a target • Assure atomicity of transfer and catalog operations • Transfer Service • Queue data transfers to/from a given Storage Element (SRM) • Receives jobs from local scheduler • Manages transfers through a set of states Dec. 16 Diligent – JRA1 Workshop Peter Kunszt14

  15. Distribution Mechanism 2 • Certainly possible to just rely on DB replication • Middleware distribution of updates between catalogs • Using a messaging system (JMS using JORAM) • Publish updates to message queue locally • Subscribe to updates at central catalogs / index nodes • Asynchronous messaging queues take care of update delivery • Scales well to the number of sites we deal with • However, error messages have to be queued for retrieval as well Dec. 16 Diligent – JRA1 Workshop Peter Kunszt15

  16. To be understood • What to distribute and how • All of the data? (Replication) • Just parts? (Indexing) • Read-write mechanisms and updates between many copies (Policies) • Metadata usage • Schema manipulation capabilities – what is really needed • Metadata services by experiments may interface with gLite or implement the gLite interfaces themselves • Are a set of canned queries good enough? If yes, user does not need to have a generic query interface. • Does all of the metadata need to be local? Or will some metadata have to be fetched from remote sites? • What kinds of distributed queries are necessary at all? • What kind of metadata is for local/laptop usage? • What kinds of update semantics are needed if at all? (Single instance, single master, multi master) Dec. 16 Diligent – JRA1 Workshop Peter Kunszt16

  17. Summary • gLite Data Management provides a complete set of file management middleware including data and catalog distribution • Many extensible modules based on simple interfaces. Capabilities may easily be extended if needed. • Actual usage patterns need to be understood in order to set up an efficient deployment scenario. • Still many difficult open questions which have to be answered individually for each Grid VO. We are looking forward to work with the community to address these issues. Dec. 16 Diligent – JRA1 Workshop Peter Kunszt17