scaling heterogeneous databases and design of disco n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Scaling Heterogeneous Databases and Design of DISCO PowerPoint Presentation
Download Presentation
Scaling Heterogeneous Databases and Design of DISCO

Loading in 2 Seconds...

play fullscreen
1 / 20

Scaling Heterogeneous Databases and Design of DISCO - PowerPoint PPT Presentation


  • 123 Views
  • Uploaded on

Scaling Heterogeneous Databases and Design of DISCO. Anthony Tomasic Louiqa Raschid Patrick Valduriez. D ISCO Architecture. A : Application M : Mediator C : Catalog W : Wrapper D : Data Source. Problems with the Architecture.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Scaling Heterogeneous Databases and Design of DISCO' - alijah


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
scaling heterogeneous databases and design of disco

Scaling Heterogeneous Databases and Design of DISCO

Anthony Tomasic

Louiqa Raschid

Patrick Valduriez

d isco architecture
DISCO Architecture

A : Application

M : Mediator

C : Catalog

W : Wrapper

D : Data Source

problems with the architecture
Problems with the Architecture
  • Fragile mediator Problem - Mediator schema may have to be changed when a new source is added.
  • Source capability problem - Different wrappers may have different functionality.
  • Graceless failure - The query can not be processed in presence of unavailable data sources.
overview
Overview
  • Mediator Query Processing
  • Describing Source Capabilities
  • Mediator Cost Model
  • Partial Evaluation of Queries
incorporating source capabilities
Incorporating Source Capabilities
  • Describing the operators : Wrapper exports information about which operators it can execute and on which collections.

Select [publications 1 { bind Author (=)

bind KeywordTitle (=)

}]

project [publications 2 { bind combine Author ()

bind combine Title ()

}]

scan [ ALL ]

  • Mediators can also accept context-free grammar which describes the functionality of the wrapper.
mediator cost model
Mediator Cost Model
  • The Mediator has a generic cost model :
    • Unary Operators :
      • sequential scan and index scan
      • cost formulae derived using calibrating approach
    • Binary Operators :
      • index join, nested loops and sort-merge join
      • if index is available, index join is chosen, otherwise the best of the other two
  • Wrapper can override the mediator model by exporting statistics and/or cost formulae.
cost communication
Cost Communication
  • Exporting Statistics - Wrapper can export statistics through two special methods attribute and extent attached to each interface description.
  • Exporting Formulae - Wrapper specific cost formulae can be described using rules.

For example,

select(C, A = V) <== CountObject = C.CountObject * selectivity(A, V) TotalSize = CountObject * C.ObjectSize TotalTime = C.TotalTime + C.TotalSize * 25

  • Mediator selects the most specific rule.
partial evaluation of queries
Partial Evaluation of Queries
  • If a data source is unavailable, DISCO evaluates as much of the query as possible and returns another query.
  • Example :

Consider the following query run when person2 is unavailable:

select x.name

from x in person0, y in person1, z in person2

where x.name = y.name and y.name = z.name

Returns the following result (where t0 is person0 join person1) :

select w.name

from w in t0, z in person2

where w.name = z.name

extracting information
Extracting Information
  • Opaque Partial Answers : No extraction possible.
  • Transparent Partial Answers : Can ask a “parachute” query which is related to the original query.

For example, a parachute query for the earlier example can be:

select x.name

from x in person0, y in person1

where x.name = y.name

  • Parachute query is evaluated by rewriting it over the materialized relations.
constrained evaluation of queries
Constrained Evaluation of Queries
  • The optimizer tries to ensure that the parachute queries can always be evaluated (if possible at all) in case of failures.

For example, if the parachute query is (A join C), then

it will not be possible to evaluate it if B fails.

partial evaluation of queries1
Partial Evaluation of Queries
  • Open Issues :
    • Semantics with updates to data sources
    • Tradeoffs between materializing partial answers and resubmitting the original queries
    • Aggregate queries ?
    • APPROXIMATE ?
the good
The Good
  • It can handle wrappers with different capabilities.
  • Mediator uses a generic cost model which can be overridden by the wrapper.
  • Partial evaluation of queries and extraction of information from partial answer.
the bad
The Bad
  • Queries involving different wrappers have to be done at the mediator.
  • Only implemented a relational subset of the model.
  • Data replication not addressed.
the ugly
The Ugly
  • Arbitrary source capabilities can not be easily handled.
  • Proliferation of wrapper specific cost rules can make query optimization very expensive.
  • Centralized query optimization - wrappers don’t have much control over it.
  • Autonomous data sources ?
mediator query processing1
Mediator Query Processing
  • Reformulate the query into local schemas.
  • Transform the query into logical operator trees.
  • Decompose each query into wrapper sub-queries and a composition query.
  • Modify the wrapper sub-queries and the composition query to reflect the capabilities of the wrappers.
  • Generate distributed execution plans .
  • Select the minimum cost plan.
  • Send the wrapper sub-trees to the wrappers and execute the composition query on the results.
mediator data model
Mediator Data Model
  • Extensions to ODMG 2.0
    • multiple extents per interface using MetaExtents
      • interface MetaExtent {

attribute String name;

attribute Extent e;

attribute Type interface;

attribute Wrapper wrapper;

attribute Map map;

      • }
    • type mapping
accessing data sources
Accessing Data Sources
  • Define a wrapper object.

wrapper w0 rmi://rodin.inria.fr/PersonWrapper

  • Define a wrapper schema.

extent p0 of Person;

interface Person {

attribute String name;

attribute Short salary;

}

This is exported to the mediator.

  • Definethe mediator schema.
accessing data sources1
Accessing Data Sources
  • Define the mediator extents

extent person0 of Person wrapper w0 extent p0;

extent person1 of Person wrapper w1 extent s1

map (name = sname);

  • Can use subtyping and views to define more complex transformations on the data sources.

define double as

select struct (name: x.name, salary: x.salary+y.salary)

from x in person0 and y in person1

where x.name = y.name