1 / 16

DIALOGUE

DIALOGUE. Dialogue DataGrid. Relational databases, files, XML databases, object stores Strongly typed Multi-tiered metadata management system Incorporates elements from OGSA-DAI, Mobius, caGrid, STORM, DataCutter, GT4 … Scales to very large data, high end platforms. Requirements.

walda
Download Presentation

DIALOGUE

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DIALOGUE

  2. Dialogue DataGrid • Relational databases, files, XML databases, object stores • Strongly typed • Multi-tiered metadata management system • Incorporates elements from OGSA-DAI, Mobius, caGrid, STORM, DataCutter, GT4 … • Scales to very large data, high end platforms

  3. Requirements • Support or interoperate with caGrid, eScience infrastructure • Interoperate with or replace SRB • Well defined relationship to Globus Alliance • Services to support high end large scale data applications • Design should include semantic metadata management • Well thought out relationship to commercial products (e.g. Information Integrator, Oracle)

  4. Topics • Overview of products • Identify standard interfaces, where things could be done the same • Ship results from one product to another • Interface for a data service (DAIS mapping etc) • What scenarios do we have? • What patterns of queries will be run on a federated system • How do we make all our products look like they come from a single vendor? • All are fairly non-overlapping • Should client APIs look similar, as well as service interfaces? • Should security be similar, handled in a similar way • Other products, how do they fit • Root (from CERN) for data access (object oriented data analysis framework) • II, SRB, GridFTP, RFT

  5. Distributed querying • Is this part of DIALOGUE? • Multi-product data integration • E.g. join across mobius and ogsa-dai • What’s missing from DAIS specs to let us do this • Output format • Standard interfaces for e.g. query planners, plugins,… • Locating relevant data • How to data discovery • Common user tool at top level to aid installation, configuration and service development • E.g. “Introduce” suite for deploying strongly typed GT4 services

  6. Metadata • Minimal amounts of metadata for • Discovery / Registries • Integration • Querying • Performance • Hindering uptake • Is web services / SOAP / XML the right technology for data integration? • Branding • How to do distributions outside of just the academic area • What’s the model for contributing to something like this • Awareness of componets • Common vision • Understanding how to cross-sell projects

  7. Multi-project data integration • Collaboration difficulties • Engaging the right people • Choosing the right process • Quality assurance, brand assurance • Agreeing a model of QA together • Integration and ease of use/install • What products components are generic, which are application/distribution specific? • Getting more effort • Target joint funding

  8. Strongly typed, strongly classified data • Is this necessary for data integration? • caBIG approach is probably not generic enough • Need programmatic interoperability, rather than user instructed • Too much burden on data provider? • What’s the minimum barrier to entry? • Generic components and tools • Which could be leveraged between products • Are they other projects which use our products together? • If not, why not • Are the problem spaces too far apart? • They are both generic, but they have different focus • Not contradictory, but not aligned • Does it make sense to develop a tightly integrated set of products? • Possibly, if funding allows: the DIALOGUE software

  9. Common Vision • Standard interfaces and where DAIS is not enough • Naming • User tools that help using products together • Metadata • Binding metadata and data, internally and externally • Collaboration

  10. Common Vision • Either: • Plug and play world where components fit together, but no restrictions on what sets • A single generic data service powerful enough to satisfy all applications • Combinations of tightly integrated components which satisfy a targeted application area • DIALOGUE should produce a convincing demonstration of how things should work • A portfolio of how our the bits work, what needs to be changed, translated, etc. • Which could later be made robust

  11. Standard Interfaces: Is DAIS enough? • Data exploration tools, administration of sets of data resources, discovery of data resources • Can we do all of this with data integration tech on top of DAIS interfaces • Does DAIS give us the minimal set of metadata we require • Don’t want to force a particular representation • But all, say XML operations should compose well • Also need to define transfer operators between representational models (structured binary, semi-structured textual, XML, relational, objects (tbd)) • Is RDF different, or a special case of XML? • Do you need to force a set of formats for each representation • Assume small set to allow proof of concept • A standard way of specifying • Query languages • Representational models • Representational formats • Transfer mechanisms • Endpoints • A way of binding data constraints and rules to data

  12. An aside • If data contains details of how it can be represented as a service, plus rules and constraints on it • How do constraints change as you do operations • E.g. what happens when you derive, copy data • Operations which change the rules

  13. Friday Breakout groups • Lunchtime • Stocktaking of components / Collaborative work / low hanging fruit (Ally, Steve, Peter, Lucas) - Cramond • Metadata (Jessie, Mario, Scott, Alex, Leena, Larry, Elias) - Newhaven • Movement of data between components (Kostas, Neil, Umit, Shannon, Ivan) - Breakout • Beyond DIALOGUE (Joel, Malcolm, Peter) - Dean • Afternoon • Metadata • Collaborative architecture • Wrapup • Organising next meeting • Unassigned • Mapping of components to scenarios • Schema federation / integration collaboration • Data Warehousing needs • Interface to bulk data / metadata

  14. Actions • Share commonalities between toolkits • White paper on choke points common to models (editor Shannon) e.g. • Common Data Model - Representing models (HDM and GME) • Schema Mappings (?+IQL and Java->XPath?) • Query translation • What’s gained and lost by each combination / layering of components • Expressed as use case, maybe tied to application scenarios (publish on our web sites, Ally) • Cross-referencing of these between sites (each group choose the 5 or so papers which describe them) • Later expand to include “external” components • Define a glossary of agreed terminology (editor Neil) • E.g. data model, data integration, global schema • Informational document in DAIS/OGSA Data • Are there common things needed from “the Grid” • Common schema format representation (across our projects) from data access services (Amy) e.g. xsd for xml, cim for relational • Component linkups • Explore integration of OGSA-DAI and DataCutter for image processing (Edinburgh MSc project?) • STORM and OGSA-DAI, with MRC Human Genetics Unit application (Edinburgh Summer Intern?) • Send across grad students from Ohio

  15. Actions • Metadata • What added functionality would you get if you added semantics to the registry as opposed to an external ontology? • Describe how to insert semantic annotations into the OGSA-DAI data resource configuration (Larry) • Can you uniformly present histograms and data required for optimisation (Alex) • Compare against Susan Malaika’s set of statistics • Send reference to survey of scalability techniques for reasoning with ontologies (Alex) • Produce strawman documents for a set of metadata required for access; optimisation; discovery and integration to be provided by a data service (Mario to ask for examples) • How can we maintain metadata for access (asked by Dave Berry) • Proposals for future projects • Send notes of discussions to participants and subscribe them to mailing list • Put it up on datagrids.org site

  16. DIALOGUE 3/4 • Venue: near GGF, Washington DC • Date: 15 -16 September 2006 • Focus: • Proposal generation • Update on documents • Venue: Vienna • Date: 28 – 30 March 2007 (PB to confirm) • Focus: • Small group discussion and document production • Finish off deliverables

More Related