CLRC e-Science Centre - PowerPoint PPT Presentation

clrc e science centre n.
Skip this Video
Loading SlideShow in 5 Seconds..
CLRC e-Science Centre PowerPoint Presentation
Download Presentation
CLRC e-Science Centre

play fullscreen
1 / 56
CLRC e-Science Centre
Download Presentation
Download Presentation

CLRC e-Science Centre

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. CLRC e-Science Centre SRB Kerstin Kleese -van Dam

  2. Special thanks to:George Kremenek - kremenek@sdsc.eduAlasdair Earl -

  3. Contents • Introduction • Architecture description • What is good • What needs improving • What can it be used for

  4. Introduction • More and more information is available today, it can be : • Random Information (e.g. news items) • Scientific Data • Commercial or Administrative Data • Data about Data (metadata describing the content of the actual data) The information is generally available via/from: Web-sites, Filesystems, Databases, Tape Libraries or on Paper and other none digital media.

  5. Introduction (2) How do you find the information: Search Engines, Catalogue Systems or Hard Work (big bucket) How do you evaluate the information: Combine, Compare, Present How do you manage the information: Preservation, Sharing, Replicating, Transferring, Securing

  6. Where does SRB fit into this Scenario? • SRB - the Storage Resource Broker can: • Integrate distributed, heterogeneous storage devices • Make data access transparent for the user • Helps to share, replicate, transfer and preserve data • SRB can not: • Replace metadata catalogues • Provide high level information services

  7. How does SRB fit into a Grid Environment? SRB can used to: Manage information required internally by Portals Integrate data across various media Integrate data across sites SRB can be used: For a particular site In a research collaboration In a wider Grid community

  8. General Facts • Storage Resource Broker - SRB • Developed by the San Diego Supercomputing Centre (SDSC) from the mid 1990’s for the US governments’ National Partnership for Advanced Computational Infrastructure (NPACI). • Initial release 1997 • Latest version V1.1.8 - released February 2001 • In the US approximately 200TB of data are shared via SRB between 30 participating Universities. • Used by the HPCPortal developed by Mary Thomas group at SDSC.

  9. The SRB/MCAT Core Team • SDSC Team • Reagan Moore, Arcot Rajasekar, Michael Wan, George Kremenek, Charlie Coward, Sheau Yen Chen, Roman Olschanowski • SRB Expertise at SDSC: • Michael Wan (SRB client/server, drivers, srbBrowser) • Arcot Rajasekar (MCAT, DB drivers) • George Kremenek (SRB Client Modules, Security, DAM, application design) • Charlie Coward – Windows Servers and Browser • Sheau Yen Chen – administration • Roman Olschanowski - testing

  10. What is SRB? • SRB is an Intelligent Data Access System • SRB provides protocol transparency to diverse and distributed storage systems • SRB provides location transparency to distributed datasets • SRB provides access transparency to remote user • Extends File Systems • Extends Database Systems • Extends I/O protocol

  11. SRB Access • SRB can be accessed in three ways: • High Level graphical Java interface - SRB Browser • Application Programming interface - SRB API (high and low level) • Unix shell Command Line Interface - SRB Scommands

  12. SRB Concepts(1) • Provide Scalability (Hosts, Resource Types, Resources, Collections, Data Objects - size and number, Users & Groups) • Provide Uniform Interfaces (to Resources, Collections and Datasets, authentication across SRB Space) • Replication of Datasets • Access Control Lists • Ticket-based Access • Authentication and Encryption (text password, encrypted password, SEA and GSI) • Server-side proxy Operations • Metadata-based Discovery

  13. SRB Concepts(2) • Provide Logical Abstractions • srbSpace - an abstract storage space • Resource Types - resource defined by properties • Resources - resource identified by name and type • multiple resources tied together as a single resource • Collections - abstraction over directory structure • distributed & curated • Datasets - identified by properties • Users - authenticated across hosts/networks • Domain - abstraction over physical domains • Metadata Schema/Attributes

  14. What is MCAT? • Cataloging System • Metadata Repository • Digital Object Metadata • type, format, lineage, usage methods, domain-specific attributes, collection info, etc • System-level Metadata • access control, audit trails, location, replication, resource types, user groups, etc • Schema-level Metadata • ontology, relationships among attributes/schemas, semantics of attributes, etc • Uniform Access and Federation interface

  15. Contents • Introduction • Architecture description • What is good • What needs improving • What can it be used for

  16. SRB V1.x Features • Multi-platform (clients and servers) • SunOS/Solaris, AIX, Cray C90, SGI, OSX • API and command line interfaces • “Low-level” and “high-level” APIs • Storage systems supported • Oracle, DB2, Sybase, HPSS, UNIX FS, W2000/NT FS, • Support for distributed servers, GSI authentication, password encryption

  17. MCAT Application (SRB client) SRB Server DB2, Oracle, Sybase, ObjectStore HPSS, UniTree UNIX, ftp Distributed Storage Resources (database systems, archival storage systems, file systems, ftp) The Storage Resource Broker

  18. How does SRB work? • The SRB Server spawns SRB Agent to authenticates the User/Application (SRB Client) by comparing it with information stored in MCAT • Find file location in MCAT • Check user request against permissions stored in MCAT • SRB Agent contacts user with the result of his/her request • The SRB Agent communicates with the user through a port specific to this client session, it can handle one or more requests from the client.

  19. The SRB Process Model Application (Host, port) SRB Master (port) SRB agents MCAT

  20. How does SRB handle remote Data Access? • Steps 1-3 are the same as in the simple case - Spawn SRB Agent on local Machine Authenticate, Check User Request, Locate File • SRB Agent contacts remote SRB Agent via SRB Server on the remote Machine where the data is stored • The second SRB Agent returns the pointer to the data item to the first SRB Agent, which passes it on to the user • The SRB Client can then interact with the data item directly (as described before, however all communication still runs via the first SRB Agent and the Machine it is situated on

  21. Remote SRB Operation Application 1 6 SRB server SRB server 3 4 5 SRB agent SRB agent 2 MCAT

  22. SRB Space • The SRB Space consists of: • A number of SRB Servers (possibly across multiple sites) • Many heterogeneous Storage Resources linked to SRB Servers via SRB Media Drivers • One MCAT System • Many Users • The SRB Space provides a single view on all the data within the Space.

  23. DR DR MC DL DL SRB SRB SRB SRB CP DR SRB CP CP CP SRB CP CP SRB DL CP CP DR CP DR - Data Repository DL - Dig Library MC - Meta Catalog CP - Comp Process/ SRB Client SRB SRB SRB DR MC DR DL SRB Space

  24. MCAT: Metadata Catalog • Stores metadata about • Users, Data sets, Resources, Methods • Provides “collection” abstraction • Stores detailed access control information • Maintains audit trail information on data sets • Implemented as a relational database with referential integrity constraints (currently uses Oracle, DB2 , Sybase)

  25. MCAT Interface Functions Schema to MAPS Convertor MAPS to Schema Convertor MAPS Initialization MAPS Semantics Answer Extractor & Cursor Control Dynamic Query Generator Schema Initialization Schema Semantics Oracle Query System DB2 Query System MCAT Architecture

  26. Federated Catalog Architecture MAPS MCAT CATALOG Semantics & Definitions Local Routines Internal Catalogs External CATALOG Interface CATALOG MAPS Interface Local Interface Local Interface CAT-2 CAT-1 Semantics & Definitions Semantics & Definitions Local Routines CATALOG CATALOG Local Routines

  27. New MCAT Features • Meta-Schema to hold System and User meta data schema information • Extensible meta data schema • Distributed meta data schema • Metadata exchange Interface Protocol • MAPS- Metadata Attribute Presentation Structure • query, update and result structures • Close to Z39.50

  28. New MCAT Features (contd.) • Core Schema Implemented • MCAT Core - Data, Resources, Users and Methods • Dublin Core • IV Core - Image Visualization attributes • Web-based Prototype User Interface • extensible schema functions • query,, insert and update of meta data • integrated presentation of meta data and data

  29. SRB Data Replication Support • Replication via Resource Set definition • Replication support integrated into write function • srbObjReplicate API can be used for post facto replication • Synchronous replication across all sites. Can choose any k out of n • Can choose specific replica on read operation

  30. NWS Data Replication Example Application SAIC MCAT SDSC SRB SRB SRB Caltech NCSA LogRsrc1 LogRsrc2 HPSS HPSS Oracle DB2 Unix

  31. Ticket-based Access Control • Owner can request ticket for a data set • Ticket can be issued for a data set or a collection • Ticket controls access by • time-period (start and expire timestamps) • number of access (count) • user names ( any, single or group users) • Non-registered Users can also access using tickets • Useful for sharing data and access through the web • Tickets generated and stored in MCAT • Currently supports read-only tickets

  32. SRB API • Programmatic API • High-level API • Low-level API • SRB Manager API • Command Level Interface - Scommands • Graphical User Interface - srbBrowser • Web Utilities

  33. SRB API Interface Application MCAT SRB Master

  34. High & Low-level API • Low-level API • talks to resource drivers • no registration of data sets in MCAT • no authentication through MCAT • User provides all information • High-level API • Uses low-level API to access resources • Registers data management information in MCAT • Uses MCAT for authentication and meta information • Uses MCAT for resource and data discovery • Access/store data in remote SRB

  35. System Manager API • srbChkMdasAuth(conn, userName, userAuth, domain) • srbChkMdasSysAuth(conn, userName, userAuth, domain) • srbRegisterUser(conn, userName, domain, password, userType, userAddress, userPhone, userEmail) • srbRegisterUserGrp(conn, userGrpName, userGrpPassword, userGrpType, userGrpAddress, userGrpPhone, userGrpEmail)

  36. srbBrowser - A SRB Graphical Interface • A java GUI • Interface with SRB servers using the client API library. • Performs most SRB operations - cp, replicate, import, export, metadata query, etc. USER Windows or Java GUI Obtain user’s metadata information via SRB. Invoke SRB operations SRB Agent MCAT Proxy operation

  37. SRB Command Line Interface Environment File USER SRB “shell” commands: Sls, Scp, Scat, Sput, Sget, ... MCAT SRB Agent Proxy operation

  38. Sinit - initialize S-environment Sexit - clean up Sman - get manpage for Scommand Scat - display srbObject on screen Sput - copy local file into srbSpace Sget - copy srbObject to local space Sappend - append to srbObject Srename - change srbObject name Srm - remove srbObject Schmod - change/grant access to srbObject Scd - change collection Spwd - display current collection Sls - list collection Smkdir - make new collection Srmdir - remove old collection SgetD - get srbObject information SgetR - get resource information SgetU - get user information SmodD - modify srbObject info SmodU - modify user info Stoken - get native type information Scopy - copy srbObject in another collection and under another name Sreplicate - clone object in new resource - same internal id Smove - move srbObject to new collection or resource Scommands

  39. Scommands (contd …) • ingestUser - adding a new user or group • ingestResource - adding a new resource • ingestLogicalResource - making a new resource grouping • addLogicalResource - adding to a resource grouping • ingetLocation - adding new location information • ingestToken - adding new native types (eg. resourceType, objectType, userType, domainName, ActionType, . . .)

  40. Scommands • Sls • Sls [-h] [-L number] [-Y number] [-r|-f] [collection ...] • Sls [-L number] [-Y number] srbObj … • Sput • Sput [-p] [-D dataType] [-R resourceName] [-P pathName] localFileName ... TargetName • Sput [-p] [-D dataType] [-R resourceName] [-P pathName] -i TargetName • Sget • Sget [-C_n ] [-p] srbObj ... localFile • Sreplicate • Sreplicate [-Cn] [-p] [-R resourceName] [-P pathName] srbObj ...

  41. Open creat read write close lseek fopen fread fwrite fclose fseek fflush fgetc fgets fputc fputs getc putc ungetc rewind vfprintf fprintf fscanf SRBIO

  42. Contents • Introduction • Architecture description • What is good • What needs improving • What can it be used for

  43. Useful features • Easy interfaces to access data held in SRB • Transparent access independent of location or type • Support for replication of data • Support for logical structuring of data • Database support to locate data • Ticket system • Enhanced access right structure • Modular SRB Media Drivers • Useful to users and system administrators

  44. Contents • Introduction • Architecture description • What is good • What needs improving • What can it be used for

  45. Current Obstacles • Only one MCAT catalogue - single point of failure, performance, ownership • All MCAT metadata is visible to everyone • Data Access at remote sites - two many interim steps • Documentation not up-to-date • Installation not straight forward - patches needed, dependent on other software • Licence required

  46. Contents • Introduction • Architecture description • What is good • What needs improving • What can it be used for

  47. Grid Applications within CLRC • Various Portals to access experimental, data and computing facilities within CLRC and outside. • Issues: • Data held widely distributed across the site and in community owned facilities • Data required where it is not stored • Data located through service that is not local to data holding

  48. DataPortal Local Archives Remote Archives Planned Structure of CLRC - Services CLRC Authentication Problem Solving Environments Computing Applications Experimental Facilities HPCPortal Remote systems Local systems

  49. ES User Application HPC Integrated Solution for Earth Science DataPortal RasDaMan Data Storage Disk Tape BADC Catalogue SRB HPCPortal

  50. General CLRC DataPortal Architecture CLRC DataPortal Server XML wrapper XML wrapper Local metadata Common metadata catalogue database Local data Facility 1