1 / 20

Metadata Services on the GRID

Metadata Services on the GRID. Nuno Santos ACAT’05 May 25 th , 2005. Contents. Metadata on the GRID ARDA-gLite Metadata Interface The ARDA Implementation Performance study: SOAP vs TCP Streaming. Metadata on the GRID. Metadata is data about data Metadata on the GRID

gram
Download Presentation

Metadata Services on the GRID

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Metadata Services on the GRID Nuno Santos ACAT’05 May 25th, 2005

  2. Contents • Metadata on the GRID • ARDA-gLite Metadata Interface • The ARDA Implementation • Performance study: SOAP vs TCP Streaming

  3. Metadata on the GRID • Metadata is data about data • Metadata on the GRID • Mainly information about files • Other information necessary for running jobs • Usually living on DBs • Need simple interface for Metadata access • Advantages • Easier to use by clients - no SQL, only metadata concepts • Common interface - clients don’t have to reinvent the wheel • Must be integrated in the File Catalogue • Also suitable for storing information about other resources

  4. ARDA-gLite Metadata Interface • ARDA proposed an interface for Metadata access on the GRID • Designed jointly with the gLite/EGEE team • Incorporates feedback from GridPP • Endorsed by the EGEE standards committee (PTF) • Being implemented in gLite File Catalog (FiReMan) • Interface concepts • Metadata - Key-value pairs • Entry - Entities to which metadata is attached • Attribute – Holds information about an entry • Schema – A collection of attributes • Type – The type (int, float, string,…) • Name/Key – The name of the attribute • Value - Value of an entry's attribute • Entries are associated with schemas • Think of schemas as tables, attributes as columns, entries as rows

  5. Interface Operations • Schema management void createSchema(String schemaName, Attribute[] attributes) void dropSchema(String schemaName) void removeSchemaAttributes(String schemaName, String[] attributeNames) void addSchemaAttributes(String schemaName, Attribute[] attributes) • Entry management void createEntry(MDEntry[] entries, String[] schemas) void removeEntry(String query) int setAttributes(String query, Attribute[] attributes) Attribute[] listAttributes(String entry)

  6. Interface Operations • Searching and retrieving entries MDResult query(MDQuery query) MDResult nextQuery(String token, MDQuery query) void endQuery(String token) • Datatypes Allows either stateful or stateless server implementations Attribute { String schema String name String type String value } MDEntry { String entry Attribute[] attributes } MDQuery { String query String queryType } MDResult { MDEntry[] entries String token Boolean done }

  7. ARDA Prototype • Validate proposed interface • Architecture: • Metadata organized in a hierarchy • Schemas can contain sub-schemas • Can inherit attributes • Analogy to file system: • Schema  Directory; Entry  File • Stability with large responses • Send large responses in chunks • Otherwise preparing large responses could crash server • Stateful server • DB → Server – Data streamed using DB cursors • Server → Client – Response sent in chunks

  8. ARDA Implementation • Backends • Currently: Oracle, PostgreSQL, SQLite • Two frontends • TCP Streaming • Chosen for performance • SOAP • Formal requirement of EGEE • Compare SOAP with TCP Streaming • Also implemented as standalone Python library • Data stored on filesystem

  9. TCP Streaming Frontend • Text based protocol (like SMTP, POP3,…) • Data streamed to client in single connection • Implementation • Server – C++, multiprocess • Clients – C++, Java, Python, Perl, Ruby Client:listattr entry Server:0 entry value1 value2 … <EOT>

  10. SOAPFrontend • Most operations in interface implemented as simple SOAP calls • query() - based oniterators • Initial request – create session • Open cursor on DB • Return initial chunk of data and session token • Subsequent requests • Client calls nextQuery() using session token • Termination – session closed when: • End of data • Client calls endQuery() • Client timeout • Implementations • Server – gSOAP (C++). • Clients – Tested WSDL with gSOAP, ZSI (Python),AXIS (Java)

  11. Current Uses of the ARDA prototype • Evaluated by LHCb-bookkeeping • Migrated bookkeeping metadata to ARDA prototype • 20M entries, 15 GB • Feedback valuable in improving interface and fixing bugs • Interface found to be complete • ARDA prototype showing good scalability • Ganga (LHCb, ATLAS) • User analysis job management system • Stores job status on ARDA prototype • Highly dynamic metadata

  12. Performance Study • SOAP increasingly used as standard protocol for GRID computing • Promising web services standard - Interoperability • Some potential weaknesses • XML encoding increases message size (4x to 10x typical) • XML processing is compute and memory intensive • How significant are these weaknesses? What is the cost of using SOAP? • ARDA metadata implementation ideal for comparing SOAP with a traditional RCP protocol

  13. Benchmark Description • Protocols • TCP-S – TCP Streaming • SOAP – Clients with gSoap (C++), Axis (Java) and ZSI (Python) • Operations • ping – A null RPC • add – Adds an entry • get – Gets all attributes of an entry • get (bulk) – Gets all attributes of several entries in a single operation • Entries • 60 attributes (ints, floats and strings) • 700 bytes on average • HTTP Keepalive/Persistant connections • HTTP Keepalive increase HTTP performance. Should improve SOAP performance. • gSOAP supports Keepalive. Axis and ZSI don’t. • TCP-S uses persistent TCP connections to compare with HTTP Keepalive

  14. SOAP Data Overhead • Measure size overhead of XML encoding • Ping • 1000 requests • Minimal payload – less than 5 bytes per request • SOAP overhead around 8 times • Get attributes in bulk • Retrieve 1000 entries • Around 800KB of application data • Streaming in TCP • Iterators with SOAP – 4KB average SOAP packet payload • With keepalive • SOAP overhead around 2.5 times Total data transferred (in KB)

  15. SOAP Toolkits performance • Test protocol performance • No work done on the backend • Switched 100Mbits LAN • Language comparison • TCP-S with similar performance in all languages • SOAP performance varies strongly with toolkit • Protocols comparison • Keepalive improves performance significantly • On Java and Python, SOAP is several times slower than TCP-S 1000 pings

  16. Single client results (LAN) • Compare performance of different operations • C++ clients (gSOAP) • When backend must do work, differences between gSOAP and TCP-S are small • Bulk operations very important for performance • getBulk 4x faster than get 1000 pings/1000 Entries

  17. Single client results (WAN) • Client CERN, server Taiwan • ≈300 ms latency • Results dominated by latency • Execution time at server irrelevant • Large performance boost from latency hiding techniques: • keepalive – fewer TCP handshakes • bulk operations – fewer client/server interactions 1000 pings/1000 Entries

  18. Scalability with Multiple Clients - Pings • Measure scalability of protocols • Switched 100Mbits LAN • TCP-S 3x faster than gSoap (with keepalive) • Poor performance without keepalive • Around 1.000 ops/sec (both gSOAP and TCP-S) 1000 pings

  19. Scalability with Multiple Clients - getAttr • Measure scalability with realistic payload • Switched 100Mbits LAN • All tests with keepalive • Smaller difference between gSOAP and TCP-S • TCP-S 2x faster (1000 vs 500 entries/sec) • Poor performance of non-bulk operations • 100 entries/sec 1000 entries

  20. Conclusions • A common Metadata Interface was developed by ARDA and gLite • Endorsed by the EGEE standards committee • Interface validated by ARDA prototype • Prototype in use by LHCb (bookkeeping, Ganga) and ATLAS (Ganga) • SOAP performance studied using ARDA implementation • Toolkit performance varies widely • Large SOAP overhead (over 100%)

More Related