1 / 40

NeSC Data Projects and Initiatives

NeSC Data Projects and Initiatives. Dr. Dave Berry Research Manager. Contents. The Data Deluge Web Services The DAI vision The OGSA-DAI Project and GGF The OGSA-DAI Software Edikt Other relevant projects in the UK. Acknowledgements. This talk includes material prepared by:

veta
Download Presentation

NeSC Data Projects and Initiatives

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NeSC Data Projects and Initiatives Dr. Dave Berry Research Manager

  2. Contents • The Data Deluge • Web Services • The DAI vision • The OGSA-DAI Project and GGF • The OGSA-DAI Software • Edikt • Other relevant projects in the UK

  3. Acknowledgements This talk includes material prepared by: • The OGSA-DAI project • The e-Diamond project • The BRIDGES project • The GGF OGSA Working Group • and others…

  4. The Data Deluge • Entering an age of data • CERN: LHC will generate 1GB/s = 10PB/y • VLBA (NRAO) generates 1GB/s today • Pixar generate 100 TB/Movie • Data stored in many different ways • Relational databases • XML databases • Flat files • Need ways to facilitate • Data discovery • Data access • Data integration Mont Blanc (4810 m) Downtown Geneva

  5. Data and images courtesy Alex Szalay, John Hopkins Astronomical Databases • No. & sizes of data sets as of mid-2002, grouped by wavelength • 12 waveband coverage of large areas of the sky • Total about 200 TB data • Doubling every 12 months • Largest catalogues nr. 1B objects

  6. Bioinformatics Databases PDB Content Growth • Biobliographic (MedLine, …) • Amino Acid Seq (SWISS-PROT, …) • 3D Molecular Structure (PDB, …) • Nucleotide Seq (GenBank, EMBL, …) • Biochemical Pathways (KEGG, WIT…) • Molecular Classifications (SCOP, CATH,…) • Motif Libraries (PROSITE, Blocks, …)

  7. Web Services • Using the protocols and ideas that have made the web a success for humans… • And applying them to distributed programming • HTTP • Single networking port • Autonomy & Failure handling • Open standards • Tools & Platforms • Apache axis • Websphere, .NET, Oracle Application Server, Sun ONE, …

  8. From Browsing to Programming

  9. A Perspective on WS Specifications

  10. Open Grid Services Architecture Share resource Access resource Manage resource Continuous Availability Applications on demand Resources on demand Secure and universal access Global Accessibility Business integration Vast resource scalability Web Services Grid Protocols The architecture of the Global Grid Forum

  11. GGF11: OGSA specification informational document Cataloging Provisioning VO Mgmt Integration Policy Mgmt Access Context Services Information Services Data Services Trouble- shooting Event Mgmt Discovery Logging Execution Mgmt Services Infrastructure Services Application Mgmt Workflow Mgmt Workload Mgmt Execution Planning Job Mgmt WSRF WSN WSDM Naming Self Mgmt Services Resource Mgmt Services Reservation Configuration Deployment Provisioning Security Services Heterogeneity Mgmt Authentication Optimization Authorization Service Level Attainment Integrity QoS Mgmt Boundary Traversal

  12. Data Access and Integration • Web Services for querying and integrating structured data resources • The foundation framework for: • Building tailored DAI applications • Higher-level services: • Replication: Data located in multiple locations • Federation: Composition of multiple sources • Provenance: How was data generated?

  13. Powered by …. The OGSA-DAI Project • Funded by the Grid Core Programme • OGSA-DAI • £3 million, 18 months, from Feb 2002 • Three major releases, three interim releases • DAIT (DAI-Two) • Keep the OGSA-DAI brand name • £1.5 million, 24 months, • from Oct 2003 • Four major releases

  14. DAI in GGF and OGSA • Data Access and Integration Services WG • Strong involvement from OGSA-DAI members • Standardise the interfaces – WS-DAI • OGSA-DAI a reference implementation • Experience informing specification work • OGSA WG Data Design Team • Designing the data-oriented aspects of OGSA • Created after GGF10 (March 2004) • Led by NeSC

  15. Cataloging Provisioning VO Mgmt Integration Policy Mgmt Access Context Services Info Services Data Services Trouble- shooting Event Mgmt Discovery Logging Execution Mgmt Services Infra Services Application Mgmt Workflow Mgmt Workload Mgmt Execution Planning Job Mgmt WSRF WSN WSDM Naming Self Mgmt Services Rsrc Mgmt Services Reservation Configuration Deployment Provisioning Security Services Heterogeneity Mgmt Authentication Optimization Authorization Service Level Attainment Integrity QoS Mgmt Boundary Traversal OGSA Design Teams Data Service design team Information Service design team EMS design team Naming design team OGSA-WG Self Mgmt design team Resource Mgmt design team Security Service design team Core (roadmap) design team

  16. Data Services design team • Informal domain expert groups within OGSA • May include co-chairs of other WG/RGs • Output is included in OGSA specification DAIS-WG OGSA Data Service Design team GSM-WG GFS-WG OGSA-WG Tele cons, F2F meetings Info-D WG ADF, OREP, …

  17. OGSA v2 Document Deliverables Root Documents Glossary Usecase doc Architecture v2 Design team Documents Service descriptions Scenarios Working Group Specifications GGF Recommendation documents

  18. How OGSA-DAI works 1a. Request to Registry for sources of data about “x” SOAP/HTTP service creation API interactions Registry 1b. Registry responds with Factory handle 2a. Request to Factory for access to database Factory Client 2c. Factory returns handle of GDS to client 2b. Factory creates GridDataService to manage access 3a. Client queries GDS with XPath, SQL, etc XML / Relational database Grid Data Service 3c. Results of query returned to client as XML 3b. GDS interacts with database

  19. OGSA-DAI compared to JDBC • Language independence at the client end • Platform independence • Do not have to worry about connection technology, drivers, etc • Can handle XML resources • Can embed additional functionality at the service end • Transformations • Third party delivery • Avoiding unnecessary data movement • Provision of Metadata is powerful • Usefulness of the Registry for service discovery • Dynamic service binding process

  20. SOAP/HTTP service creation API interactions Application Code Future DAI Services 1a. Request to Registry for sources of data about “x” & Data “y” Registry 1b. Registry responds with Factory handle 2a. Request to Factory for access and integration from resources Sx and Sy Data Access & Integrationmaster 2c. Factory returns handle of GDS to client 3b. Client 2b. Factory creates tells GridDataServices network analyst Client 3a. Client submits sequence of scripts each has a set of queries GDTS to GDS with XPath, SQL, etc 1 XML Analyst GDS GDTS database GDS 2 S x GDS S 3c. Sequences of result sets returned to y Relational analyst as formatted binary described in GDTS GDS GDS 2 3 a standard XML notation 1 database GDS GDTS

  21. Activities are the drivers • Express a task to be performed by a GDS • Three broad classes of activities: • Statement • Transformations • Delivery • Extensible: • Easy to add new functionality • Does not require modification to the service interface • Extension operate within the OGSA-DAI framework • Functionality: • Implemented at the service • Work where the data is (do not require to move data back)

  22. OGSA-DAI Deck

  23. Building Applications • Activities are grouped together • Perform document • Data can flow between activities • Optimisation • Avoids multiple message exchanges • Can deliver to other GDSs • Prerequisite for data integration • Base middleware for projects requiring data access • Some capability for data integration

  24. Release 4, April 2004 • Provides Data Access components, an extensible framework for building applications and some integration components • Built on top of Globus Toolkit 3.2 • Supports relational, xml and some files • MySQL, Oracle, DB2, SQL Server, Postgres, XIndice, CSV • Supports various delivery options • SOAP, FTP, GridFTP, HTTP, files, email, inter-service • Supports various transforms • XSLT, ZIP, GZip • Supports message level security using X509 certificates • Client Toolkit library for application developers • GUI data browser (contributed by FirstDIG project) • Separate Distributed Query Processing components • Comprehensive documentation and tutorials in XHTML format

  25. Downloads by Release 2746 downloads (~4.7 downloads a day)

  26. Downloads by country 792 registered users @ 23/8/04

  27. Release 5, October 2004 • Re-engineered interface-independent core OGSA-DAI functionality. • Improved dependability and security integration. • New file data resources representing flat files queried using full text searches (e.g. EMBL format). • Installation and Configuration Wizard, including “all-in-one installer” • Improved Data Browser which allows XPath querying. • Set of standard benchmarks. • JSP Quick View interface. • Support for other databases (e.g. Access, Exist, HSQL).

  28. Release 6, April 2006 • Data Integration applications supporting identified scenarios • OGSA-DQP as an integrated part of release • Fully compliant JDBC Driver for OGSA-DAI • Support for WS-Security implementations • Support for stored procedures on all supported databases • Improved support for different database specific SQL types • SQL translation between vendor dialects for subset of queries • Support for XQuery data resources • We expect to comply with a version of the emerging DAIS specification at this release.

  29. Who is Using OGSA-DAI? N2Grid (http://www.cs.univie.ac.at/institute/index.html?project-80=80) Bridges (http://www.brc.dcs.gla.ac.uk/projects/bridges/) BioSimGrid (http://www.biosimgrid.org/) INWA (http://www.epcc.ed.ac.uk/projects/inwa/) BioGrid (http://www.biogrid.jp/) AstroGrid (http://www.astrogrid.org/) eDiaMoND (http://www.ediamond.ox.ac.uk/) OGSA-DAI (http://www.ogsadai.org.uk) GEON (http://www.geongrid.org/) myGrid (http://www.mygrid.org.uk/) MCS (http://www.isi.edu/~deelman/MCS/) ODD-Genes (http://www.epcc.ed.ac.uk/oddgenes/) OGSA-WebDB (http://www.gtrc.aist.go.jp/dbgrid/) GridMiner (http://www.gridminer.org/) FirstDig (http://www.epcc.ed.ac.uk/~firstdig/) GeneGrid (http://www.qub.ac.uk/escience/projects.php#genegrid) IU RGRBench (http://www.cs.indiana.edu/~plale/projects/RGR/OGSA-DAI.html)

  30. Project classification

  31. Standards E-Science Apps CS Research Grid Services fore-Science Data Management Commercial SW componentsand skills Edikt • The team: 8 professional software engineers, support staff, project manager, commercialisation manager, architect, and SAB • SHEFC funded research and development grant • 3 years funding: May 2002 – 2005 • +3 years funding upon successful project and review Requirementsanalysis Technologymatchmaking Edikt project Gap filling Rigorousengineering

  32. Web User1 Grid Proxy Web Servlet DAC DAC DAC DAC ELDAS – Data Access Service Grid User1 Grid User2 • Implemented using Enterprise Java Beans • Data Access Components interface to distinct DBMSs • Accessible as a grid data service or a web data service JavaFramework Another (partial) implementation of the GGF WS-DAI specifications ELDAS EJB - DAS Xindice DB MySQL DB DB2 DB Oracle 9i DB

  33. BinX file describes binary file structure BinX – accessing legacy binary data simulations • The Problem: • Many binary data files • Applications must “know”the data format • Binary data formats are machine-specific BinaryData File BinaryData File BinaryData File • The Solution: • Write a “stand-aside” format description in XML • Provide a library to • Interpret the description • Provide file access across different machines • Build higher-level services BinX Library e-ScienceApplication

  34. Mammography A prototype of a national database of mammographic images in support of the UK breast screening programme Temporal mammography Computer Aided Detection Standard Mammo Format Mammograms have different appearances, depending on image settings and acquisition systems 3D View

  35. CHU KCL UED UCL Training Application Data Load Training App Data Load Training App Data Load Training App Data Load Training App Core API Training API Core & Training API Core & Training API Core & Training API Core & Training API Training Services Core Services Core Services Core Services Core Services Content Manager Content Manager Content Manager Content Manager DB2 DB2 DB2 DB2 OGSA-DAI OGSA-DAI OGSA-DAI OGSA-DAI OGSA-DAI OGSA-DAI DB2 Federation Files Database

  36. The BRIDGES Project • Biomedical Research Informatics Delivered by Grid Enabled Services • NeSC (Edinburgh and Glasgow) and IBM • www.brc.dcs.gla.ac.uk/projects/bridges • Supporting project for CFG project • Generating data on hypertension • Rat, Mouse, Human genome databases • Variety of tools used • BLAST, BLAT, Gene Prediction, visualisation, … • Variety of data sources and formats • Microarray data, genome DBs, project partner research data, medical records, … • Aim is integrated infrastructure supporting • Data federation • Security

  37. Information Integrator OGSA-DAI SyntenyGrid Service blast + BRIDGES VO Authorisation

  38. INWA Project • Innovation Node Western Australia • Informing Business & Regional Policy: Grid-enabled fusion of global data and local knowledge • Involved 10 partners (6 UK + 4 Australia) • Aim • Data mine commercially sensitive data • Security an absolute MUST • Employ Grid technologies • Need access to data and computational resources • OGSA-DAI • Access data resources • SunDCG's TOG (Transfer-queue Over Globus) • Handle job submission to analyse micro array data

  39. TOG EPCC,UK user@australia OGSA-DAI OGSA-DAI Bank data UK Property Grid Engine Grid Engine TOG Curtin,Australia Bank Bank Telco Telco user@edinburgh OGSA-DAI OGSA-DAI Telco data Australian property Data Browser Data Browser INWA

  40. Further Information on OGSA-DAI • The OGSA-DAI Project Site: • http://www.ogsadai.org.uk • The DAIS-WG site: • http://cs.man.ac.uk/grid-db • OGSA-DAI Users Mailing list • users@ogsadai.org.uk • General discussion on grid DAI matters • Formal support for OGSA-DAI releases • http://www.ogsadai.org.uk/support • support@ogsadai.org.uk • OGSA-DAI training courses

More Related