1 / 19

XML for Data Grid Applications

XML for Data Grid Applications. Chip Watson Thomas Jefferson National Accelerator Facility. Why XML? -- Industry Trends. Strategy: Use web technologies, follow the success of the web... E-commerce companies (especially B2B) are currently investing heavily in XML technologies...

tausiq
Download Presentation

XML for Data Grid Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. XML for Data Grid Applications Chip Watson Thomas Jefferson National Accelerator Facility PPDG Meeting

  2. Why XML? -- Industry Trends Strategy: Use web technologies, follow the success of the web... E-commerce companies (especially B2B) are currently investing heavily in XML technologies... Example news items: [December 11, 2000] "iPlanet Unveils Industry's First Full-Up B2B Commerce Platform…[based upon XML]” [December 08, 2000] "Schemantix (formerly Praxis) to Launch Schemantix Development Platform (SxDP) at XML 2000.’’ “Microsoft is augmenting its OLE DB for OLAP protocol with new interfaces based on XML…`The brass tacks on this is we're all going to run our analytical apps over the Internet, and the language these apps will use to communicate with their data sources will be XML,’ says Clay Young, VP of marketing at online analytical processing software vendor Knosys Inc.” -- InformationWeek, Dec 7, 2000 PPDG Meeting

  3. What is XML ? eXtensible Markup Language • Like HTML, but with user defined tags • Tags refer to content, not presentation: <?xml version='1.0' encoding='ISO-8859-1'?> <directory name="/clas" owner="root" group="other" modified="Aug 22 08:34"> <file name='97-12'/> <file name='98-02'/> <file name='98-03'/> <directory name='comm97'/> <directory name='e1'/> </directory> Properties of node Node contents XML has a tree data model PPDG Meeting

  4. XML vs CORBA • XML is more verbose • data transported as character strings (~2x for float) • data is self describing, with string tags (~2x) (however, lists are separated by single whitespace, so string lists are carried with little overhead) • CORBA is harder to deploy • requires ORB, complex libraries, name server, etc. • Both are language neutral • XML supported in C/C++, Java, Perl, etc. PPDG Meeting

  5. What about SOAP ? Simple Object Access Protocol SOAP is a protocol specification for invoking methods on servers, services, components and objects (RPC system). SOAP codifies the existing practice of using XML and HTTP as a method invocation mechanism. The SOAP specification mandates a small number of HTTP headers that facilitate firewall/proxy filtering. The SOAP specification also mandates an XML vocabulary that is used for representing method parameters, return values, and exceptions. PPDG Meeting

  6. Simple POST vs SOAP • Simple POST • query contains tagged string values, like http://xxx.yyy.zzz/page?name=xyzzy&owner=watson • SOAP • query contains structured arguments, even user defined types (example to follow) In either case, response is an http response of type xml, with arbitrary (tree-like) structure PPDG Meeting

  7. SOAP structure example <SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"> <SOAP-ENV:Body> <ppdg:AddFile xmlns:ppdg=”http://schemas.ppdg.org/soap/xmlns.ppdg"> <directory>/clas/90-03/</directory> <file>test7.dat <owner name=“watson”/> <activity name=“calibration”/> </file> </ppdg:AddFile> </SOAP-ENV:Body> </SOAP-ENV:Envelope> PPDG Meeting

  8. Analysis: Simple vs SOAP • ReplicaCatalog & ReplicaHost (OO api) • need to send method name & [0-2] string args • Future catalog queries • may need to send many selection criteria, but this could be done as a simple query string (hence 1 argument) • question: may want to “batch” requests, sending, for example, an array of file names to resolve ? [could be done as many single calls, and let TCP buffer] • Conclusion: Requirements do NOT dictate SOAP • May still choose SOAP for standardization reasons…although the proposer does not have a good track record here PPDG Meeting

  9. Prototyping XML at Jlab Goals: • Get experience w/ XML • Get experience w/ using XML in servlets • Demonstrate feasibility of using XML as web protocol for ReplicaCatalog and ReplicaHost • Deploy prototype replica system for experimental physics data stored in Jlab silo • currently OSM + custom java infrastructure • plan to replace OSM, resulting in pure java infrastructure PPDG Meeting

  10. XML & HTML sql db Two types of servlets used, one generating xml, another which calls the first, and uses a library (few calls) to apply a style sheet to the xml and generate html ldap db XML servlet xml client corba obj HTML servlet html client style sheet PPDG Meeting

  11. Prototype Components • ReplicaCatalog • java servlet producing XML • xsl style sheet to translate this to html for browsers • servlet to do formating (via style sheet) • ReplicaHost • java servlet producing XML • xsl style sheet to translate this to html for browsers • servlet to do formating (via style sheet) • Simple file transfer servers • currently bbftpd, but soon httpd, gsiftpd PPDG Meeting

  12. Replica Catalog • Implemented as Java servlet (Apache + Tomcat) • currently uses fork rsh ls /mss … to get listing of silo contents for demo purposes • will use mysql via jdbc for persistent store (very soon) • supports tree data model (maps existing silo system) • Produces XML output for directory: • listing of one directory, contents are files + subdirectories • includes properties of this directory (owner, etc.) for file: • properties of the file (owner, etc.) • ReplicaHost(s) holding the file PPDG Meeting

  13. Replica Host • Gives access information (disk-resident, offline, etc.) • If disk resident, locally translates file name (virtual path) to URL(s), indicating supported protocols, such as http://xxx.jlab.org/diskcache9/clas/file7.dat bbftp://bbftp.jlab.org/diskcache9/clas/file7.dat gsiftp://xxx.jlab.org/diskcache9/clas/file7.dat • Future (within 1-2 months): • support request to stage to disk • support request to “pin” a file (advisory only) • support request to store a file (push and/or pull?) • manage update to catalog in response to local deletions of files • web pages to fetch any file via browser PPDG Meeting

  14. Demo • xml test of ReplicaCatalog viewed as xml • processed with style sheet & viewed as html PPDG Meeting

  15. Note: Directory Model Changed Recommendation: • Change the catalog data model to allow file system (tree) symantics in the logical name space. • Hierarchical (apparently) containers • Actual containers may still be flat: /a/b/c is one container /a/b/c/d/e is a separate container /a/b/c appears to contain “d” (even if not implemented that way in storage) This will probably be more attractive to physicists and other users. PPDG Meeting

  16. Future Activities 1. Finish SQL database for ReplicaCatalog 2. Finish integration of ReplicaHost and Jlab silo 3. Create exportable package for ReplicaHost • Disk cache manager (java based) • mountable by local clients • ReplicaHost (java servlet based) • File transfer daemons • http • bbftp • gsiftp • gridftp PPDG Meeting

  17. PPDG Sub-project (1) Protocol standardization • choice of simple or SOAP • standardization of method names and / or arguments for requests • XML tag name standardization • response standardization (e.g. one directory listing) PPDG Meeting

  18. PPDG Sub-project (2) 1. Shared ReplicaCatalog servlet implementation • standardize java interface to local persistent store • implement reference implementations: 1. above LDAP (compatible w/ or extending Globus solution) 2. above JDBC (Jlab design, open to revisions of schema) 2. Shared ReplicaHost servlet implementation • standardize java interface to local silo, disk managers • implement reference implementations: 1. CORBA calls to SRB 2. RMI calls to Jlab disk & silo managers 3. other? PPDG Meeting

  19. PPDG Sub-project (2) 3. C/C++ and Java client libraries • for Java & C++, implementing an OO api with local browsing of xml data 4. Extend ReplicaHost to support queueing of transfer requests... ...to/from other ReplicaHosts • negotiate transfer protocol with other host • negotiate push/pull with other host ...to/from remote transfer daemon • protocol and direction fixed PPDG Meeting

More Related