1 / 29

OGSA-DAI Status and Benchmarks

OGSA-DAI Status and Benchmarks. All Hands Meeting 2005 Nottingham, 22 September 2005. Overview. The all new OGSA-DAI overview Benchmarking and profiling work Project collaboration Future plans. ESNW, Manchester. OGSA-DAI team. NeSC, Edinburgh. EPCC Team, Edinburgh. NEReSC, Newcastle.

irish
Download Presentation

OGSA-DAI Status and Benchmarks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. OGSA-DAIStatus and Benchmarks All Hands Meeting 2005 Nottingham, 22 September 2005

  2. Overview • The all new OGSA-DAI overview • Benchmarking and profiling work • Project collaboration • Future plans AHM2005

  3. ESNW, Manchester OGSA-DAI team NeSC, Edinburgh EPCC Team, Edinburgh NEReSC, Newcastle IBM Dissemination Team IBM Development Team, Hursley AHM2005

  4. OGSA-DAI In One Slide • An extensible framework for data access and integration. • Expose heterogeneous data resources to a grid through web services. • Interact with data resources: • Queries and updates. • Data transformation / compression • Data delivery. • Customise for your project using • Additional Activities • Client Toolkit APIs • Data Resource handlers • A base for higher-level services • federation, mining, visualisation,… AHM2005

  5. SQL SQL SQL SQL JDBC JDBC JDBC JDBC Extensibility Example OGSA-DAI service Engine SQLQuery SQLQuery Multiple SQL GDS JDBC MySQL AHM2005

  6. Timeline 2003 2004 2005 OGSA-DAI WSRF 1.0 OGSI Release 6  Release 1 Release 3.1 OGSA-DAI WS-I 1.0/ OGSA-DAI WS-I 1.1 (OMII) Release 1 interim Release 4 Release 2 Release 2 interim Release 5 Release 3 AHM2005

  7. SOAP GDS GDS GDS Out with the old… Client Client Client Toolkit API DAISGR Server GDSF GDSF GDSF Relational XML Files Data AHM2005

  8. Client Generic Client Toolkit API WSRF WS-I SOAP Data Service Data Service WS-I WSRF DAI Core DSR DSR DSR Relational XML Files … in with the new! Client Server Data AHM2005

  9. Changes in moving to WSRF/WS-I • Registry component (DAISGR) no longer supported • Hope to leverage of third party registration services • GRIMOIRES (http://www.omii.ac.uk/mp/mp_grimoires.htm) • Others … • GDS/GDSF roles combined • Use data services • Currently static services but • Reconfigurable services • Improvements to the GDS • Data resource abstraction decoupled from the service • Renaming (consistent naming across platform versions) • Ability to enforce control flow constraints (ordering activities) • Refactored exception framework • Temporary set-backs (we promise we’ll fix them) • No security model • No concurrency • Previously used GDSs for concurrency • Support now moving to the engine AHM2005

  10. Benchmarking/Profiling • Establish benchmark suite to: • Measure performance gains/losses between releases • Reveal implementation issues • Allows focused improvements • Establish best practice • Summer intern (Heather Kelly) produced results • Profiling allows us to identify particular areas which are causing poor performance in the benchmarks • Summer intern (Radoslaw Ostrowski) extended Netlogger and did some profiling • Most of the results are for OGSA-DAI R6 • one slide showing what is happening in R7 AHM2005

  11. Tomcat 4.1.29 GT 3.2.1 OGSA-DAI OGSI R6.0 j2sdk 1.4.2_01 Windows XP Pro SP2 Intel PIII 863MHz 512Mb RAM 10MBit network SunOS 5.9 UltraSPARC-IIe 502 MHz 128Mb RAM Configuration • Measure the time to: • Send SQL query to server • Return nRows • Sum the values in one of the columns • Do this 30 times • Calculate mean and standard deviation • Repeat the process having increased nRows by stepsize • Try various different databases • Notes: • Time to establish connection in JDBC runs not included • JDBC does not return results in WebRowSet format • Server is already running • Data source little blackbook • Test database included in distributions AHM2005

  12. Some benchmarks • Relational query • StreamServlet requires two communications • could improve this • FTP not iterating over result set • JDBC scales much better than SOAP • ResultSet implementations • Forwards-backwards implementation builds DOM tree; larger memory footprint AHM2005

  13. Database comparison (OGSA-Dai WSRF 1.0, nRows = 10000, number of runs = 30, stepsize = 500) AHM2005

  14. Platform comparison(MySQL database, nRows = 10000, number of runs = 30, stepsize = 500) AHM2005

  15. Profiling: better RowSet conversion ResultSet to RowSet conversion AHM2005

  16. R6->R7: removal of RowSet AHM2005

  17. Challenges • Intermediate representation • between multiple models (relational, XML,…) • XML WebRowSet is flexible (c.f. GridMiner) but expansive • DFDL and GridFTP/parallel HTTP? • Query definition • translation of queries • Data transport and workflow • workflow is typically compute driven • Move computation to data • mobile code activities? • data services hosted on DBMS? AHM2005

  18. caBIG “Object-Oriented” view of data • Data types are well-defined and registered in a repository • Standardized metadata facilitates discovery • custom query language implemented as an activity AHM2005

  19. IU UA Huntsville Okla Univ Millersville UCAR Unidata NCSA Illinois Each satellite replicates its contents to the master catalog Master catalog LEAD AHM2005

  20. Users Group and DIALOGUE Workshops • 3rd Users Group meeting • June 1st • http://www.ogsadai.org.uk/docs/UG3/ • DIALOGUE Workshops • Data Integration Applications: Linking Organisations to Gain Understanding and Experience • Columbus, Edinburgh, Vienna, Indiana • Bringing together Data Integration middleware and application providers with users • http://www.datagrids.org AHM2005

  21. Future plans • A new version of the OGSA-DAI Engine • should look mostly the same externally • better support for concurrency, sessions and monitoring • see Architecture paper/talk presented on Monday • Implementing new versions of specifications • DAIS Specifications • Key things that we will be addressing after Release 7: • Performance • A Security Model which can be applied across platforms • Full Transactions provision, including implementation of compensatory activities, distributed transactions • More data integration facilities • Better abstraction over DBMS variation AHM2005

  22. Conclusions • OGSA-DAI has had to undergo significant refactoring to keep stakeholders happy • Refactoring has allowed us to create an extensible framework which can be used for many data related tasks • We need to identify the components and improvements which will be useful to users • There is obviously room for improvement on performance, and we are working on it AHM2005

  23. Further information • The OGSA-DAI Project Site: • http://www.ogsadai.org.uk • The DAIS-WG site: • http://forge.gridforum.org/projects/dais-wg/ • OGSA-DAI Users Mailing list • users@ogsadai.org.uk • General discussion on grid DAI matters • Formal support for OGSA-DAI releases • http://www.ogsadai.org.uk/support • support@ogsadai.org.uk • OGSA-DAI training courses AHM2005

  24. Core features of OGSA-DAI – I • A framework for building applications • Supports data access, insert and update • Relational: MySQL, Oracle, DB2, SQL Server, Postgres • XML: Xindice, eXist • Files – CSV, BinX, EMBL, OMIM, SWISSPROT,… • Supports data delivery • SOAP over HTTP • FTP; GridFTP • E-mail • Inter-service • Supports data transformation • XSLT • ZIP; GZIP • Supports security • X.509 certificate based security AHM2005

  25. Core features of OGSA-DAI – II • A framework for building data clients • Client toolkit library for application developers • A framework for developing functionality • Extend existing activities, or implement your own • Mix and match activities to provide functionality you need • Highly-extensible • Customise our out-of-the-box product • Provide your own services, client-side support and data-related functionality • Comprehensive documentation and tutorials • Latest release supports GT3.2 (to be deprecated), GT4.0, and Axis 1.2 / OMII_2 using Java 1.4 AHM2005

  26. Efficient client-server communication Minimise where possible One request specifies multiple operations No unnecessary data movement Move computation to the data Utilise third-party delivery Apply transforms (e.g., compression) Build on existing standards Fill-in gaps where necessary OGSA-DAI Design Principles – I AHM2005

  27. OGSA-DAI Design Principles – II • Do not hide underlying data model • Users must know where to target queries • Data virtualisation is hard • Extensible architecture • Modular and customisable • e.g., to accommodate stronger security • Extensible activity framework • Cannot anticipate all desired functionality • Activity = unit of functionality • Allow users to plug-in their own AHM2005

  28. Data Integration challenges • Metadata extraction • define a common model for e.g. database schema? • Intermediate representation • between multiple models (relational, XML,…) • XML WebRowSet is flexible (c.f. GridMiner) but expansive • DFDL and GridFTP/parallel HTTP? • Query definition • translation of queries • Data transport and workflow • workflow is typically compute driven • Move computation to data • mobile code activities? • data services hosted on DBMS? AHM2005

  29. Contributing to OGSA-DAI • Additional functionality: • Provide activities which implement specific functionality • Provide extra client functionality • Provide different security mechanisms • Provide higher level components and applications • Different levels of contributions • Based on OGSA-DAI? • Works with OGSA-DAI? • Part of OGSA-DAI? AHM2005

More Related