Generic model management a database infrastructure for schema manipulation
Download
1 / 33

Generic Model Management A Database Infrastructure for Schema Manipulation - PowerPoint PPT Presentation


  • 91 Views
  • Uploaded on

Generic Model Management A Database Infrastructure for Schema Manipulation. Philip A. Bernstein Microsoft Corporation September 6, 2001. The Problem. There is 30 years of DB Research on meta data But we don’t have great infrastructure to offer

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Generic Model Management A Database Infrastructure for Schema Manipulation' - rivka


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Generic model management a database infrastructure for schema manipulation

Generic Model ManagementA Database Infrastructure for Schema Manipulation

Philip A. Bernstein

Microsoft Corporation

September 6, 2001


The problem
The Problem

  • There is 30 years of DB Research on meta data

  • But we don’t have great infrastructure to offer

    • Most design tools and web services store meta data in files, not DBs

    • OODBMS’s are not a huge success

    • Most meta data driven tools use their own infrastructure

  • Goal: generic meta data manipulation infrastructure

    • Reduce the amount of programming required to build meta data driven applications.

  • Proposal: Model Management

    • Define an algebra to manipulate meta data in large chunks, called models and mappings.


Outline
Outline

  • Overview of Model Management

  • Solutions to classical meta data problems

  • Recent technical results


Models and mappings
Models and Mappings

  • Model – a complex information structure

    • XML schema, SQL schema, OO interface, UML model, web site map, make script, ….

  • Mapping – a transformation from one model into another

    • Map between two XML schemas

    • Map a SQL schema to an XML schema

    • Map data sources to a data warehouse

    • Map an ER diagram to a SQL schema

    • Map a process defn to a workflow script


Representation

Relational

Schema

XSD

map1

Emp

Emp

E#

E#

Dept#

Dept#

Name

Name

First

A mapping is a model each of whose nodes connects nodes of two other models

Last

Representation

A model is a directed graph with one root.


Model management algebra
Model Management Algebra

  • Enumerate

  • ApplyFunction

  • Copy

  • Update operations

  • Match

  • Merge

  • Compose

  • Select

  • Diff


Map match m 1 m 2

map1

=

=

map = Match(M1, M2, )

  • Match(M1, M2, ) returns the best mapping between M1and M2, w.r.t. to 

M2

Emp

M1

Emp

E#

E#

Dept#

Dept#

Phone

Name

Name

Addr

First

Last


M 3 merge m 1 m 2 map

Emp

Emp

Emp

mapC

Addr

Name

Phone

Name

Phone

Addr

Name

=

M3 = Merge(M1, M2, map)

  • Return the union of models M1 and M2

    • Use map to guide the Merge

    • If elements x = y in map, then collapse them into one element


Left composition f

Emp

Emp

mapC

mapC = mapA f• mapB

Addr

c1

StAddr

Street

c2

Town

c3

City

Left Composition ( f •)

Emp

Emp

Emp

mapB

mapA

Addr

Name

Name

a1

b1

Street

Street

StAddr

b2

a2

City

Town

b3

a3

City

M2

M1

M3


Model management algebra1
Model Management Algebra

  • list = Enumerate(M)

  • ApplyFunction(M, f )

  • M2= Copy(M1)

  • Update operations

  • map = Match (M1, M2, )

  • M3 = Merge (M1, M2, map)

  • map3 = Compose(map1, map2)

  • M2= Select(M1, pred)

  • M2= Diff(M1, map)

They’re generic = data model independent

… well … implemented on an extended ER model with an extensibility story


Example

xsd1

1. map2

2. map3

1. map2= Match(xsd1, xsd2)

3. map4

map1

rdb1

2. map3 = map1map2

rdb2

Example

  • Given

    • map1 from SQL schema rdb1 to xsd1,

    • xsd2, which is similar to xsd1

  • Produce

    • a map between xsd2 and a relational schema.

xsd2

3. <map4, rdb2 > = Copy(map3)

4. Use ApplyFunction(map4) to map each x in Diff(xsd2,map4) into rdb2


Theme
Theme

  • Classic meta data problems can be solved using Model Management operations

    • Schema integration

    • Schema evolution

    • Data migration

    • Reverse engineering

    • Data reintegration (3-way merging)

  • Published solutions to these problems help us produce generic implementations of model mgmt operations


Outline1
Outline

  • Overview of Model Management

  • Solutions to classical meta data problems

    • Schema integration

    • Schema evolution

    • Reverse engineering

    • Data reintegration (3-way merging)

    • Data migration

  • Recent technical results


Schema integration

S

S

1. map= Match(V1, V2)

map

2. S = Merge(V1, V2 , map)

2.

3. ApplyFunction(S) // to resolve conflicts in S, producing S

1.map

V2

V1

Schema Integration

  • Given

    • two view schemas, V1 and V2

  • Produce

    • an integrated schema, S


map

f

=

L

Name

FirstName

=

R

LastName

S

Emp

E#

1. map= Match(V1, V2)

FirstName

2. S = Merge(V1, V2 , map)

Dept#

LastName

3. Use ApplyFunction(S) to re-solve conflicts, producing S

Addr

Phone

V1

V2

Emp

Emp

E#

E#

Dept#

Dept#

Phone

Addr

FirstName

Name

LastName


Merging knowledge bases ontologies
Merging Knowledge Bases (Ontologies)

  • Same as schema integration, but applied to ontologies

  • The literature on merging ontologies focuses mostly on Match.


Schema evolution

2. mapSV

V

1. mapSS= Match(S, S)

mapSV

2. mapS V = mapS SmapSV

S

S

1. mapSS

Schema Evolution

  • Given

    • mapSV from schema S to view V

    • a modified version S of S

  • Produce

    • a mapping mapSV from S to V (i.e. a view defn for V over S).

  • Use ApplyFunction(V) to delete elements not derivable from S


Outline2
Outline

  • Overview of Model Management

  • Solutions to classical meta data problems

    • Schema integration

    • Schema evolution

    • Reverse engineering

    • Data reintegration (3-way merging)

    • Data migration

  • Recent technical results


Reverse engineering

M

M

1. mapGG= Match(G, G)

2. mapMG = mapMG  mapGG

2. mapMG

mapMG

3.mapMG

3. <M, mapG M > = Copy(mapMG)

1. mapGG

G

G

Reverse Engineering

  • Given

    • Model M (e.g., an ER model)

    • Model G (e.g., SQL) generated via mapMG from M

    • A modified version G of G

  • Produce

    • A modified version M of M that generates G

4. Use ApplyFunction(mapMG), to reverse engineer each g in Diff(G,mapMG) into M


3 way merge aka reintegration

S0

  • MapOA = Match(O, A) (based on OIDs)

  • MapOB = Match (O, B) (based on OIDs)

  • MapOA = ApplyFunction(MapOA) such that if eMapOA if domain(e) = range(e), then delete e  (i.e. things changed in A)

  • MapOB = ApplyFunction(MapOB) such that if eMapOB if domain(e) = range(e), then delete e (i.e. things changed in B)

  • ChangedA = range(MapOA)

  • ChangedB = range(MapOB)

  • MapChAChB = Match(ChangedA, ChangedB)

  • MapChBChA = invert(MapChAChB)

  • A = Diff(ChangedA,  ChangedB, MapChAChB) (changed in A but not changed in B)

  • B = Diff(ChangedB, ChangedA, MapChBChA)

  • MapAB =  Match (A,B) (by OIDs)

  • G = Merge (A,B, MapAB)

  • MapGA =Match(G,A)

  • GA = Merge (G, A, MapGA) with preference for A

  • MapGAB =Match(GA,B)

  • GAB = Merge (GA’, B’, MapGA’B’) with preference for B

  • DeletedA = Diff(O,A,MapOA)

  • DeletedB = Diff(O, B, MapOB)

  • MapDeletedAChangedB = Match(DeletedA, ChangedB)

  • MapDeletedBChangedA = Match(DeletedB, ChangedA)

  • ShouldDeleteA = Diff(DeletedA, ChangedB, MapDeletedAChangedB)

  • ShouldDeleteB = Diff(DeletedB, ChangedA, MapDeletedBChangedA)

  • MapGABSDA = Match(GAB, ShouldDeleteA)

  • GABSDA = Diff(GAB, ShouldDeleteA, MapGABSDA)

  • MapGABSDASDB = Match(GABSDA,ShouldDeleteB)

  • Final result = Diff(GABSDA, ShouldDeleteB, MapGABSDASDB)

S1

S2

S3

3-Way Merge (aka Reintegration)

  • Given

    • a source schema S0

    • two derived schemas S1 and S2

  • Produce

    • a schema S3 that merges the changes of S1 and S2


Data migration

D

Enum

Generate

Migration

Script

1. mapSS= Match(S, S)

Run

2. Use Enum(S) to generate a data migration script

1. mapSS

D

S

S

Data Migration

  • Given

    • a schema S and its database D

    • an evolved schema S

  • Produce

    • a procedure for mapping D into an S database D


Data translation
Data Translation

  • Like data migration, except S and S are expressed in different data models.


Outline3
Outline

  • Overview of Model Management

  • Solutions to classical meta data problems

  • Recent technical results


Status report
Status Report

  • Vision

    • [Bernstein, Halevy, & Pottinger, SIGMOD Record 12/00]

  • Data Warehouse Examples

    • [Bernstein & Rahm, ER ’00]

  • Match Operation

    • Survey: [Rahm & Bernstein, MSR Tech Report]

    • Prototype: [Madhavan, Bernstein, & Rahm, VLDB ’01]

  • Merge Operation

    • coming soon …

  • Theory

    • [Alagić & Bernstein, DBPL ’01]


Schema matching approaches

Individual matchers

Combined matchers

Schema-based

Content-based

Hybrid

Composite

Per-Element

Structural

Per-Element

Manual composition

Automatic composition

Constraint-based

Constraint-based

Constraint-based

Linguistic

Linguistic

  • Names

  • Descriptions

  • Types

  • Keys

  • Graph matching

  • IR (word frequencies, key terms)

  • Value pattern and ranges

Schema Matching Approaches

  • About a dozen published algorithms.

  • Many good ideas, but none are robust.


The cupid algorithm

PurchaseOrder

PO

DeliverTo

InvoiceTo

POShipTo

POBillTo

Address

Address

City

City

Street

Street

City

City

Street

Street

The CUPID Algorithm

  • Computes linguistic similarity of element pairs

  • Computes structural similarity of element pairs

  • Generates a mapping

ssim++


M 3 merge m 1 map m 2

X

X

Y

Z

a

a

a

Y

W

Z

M3 = Merge(M1, map, M2)

  • [Buneman, Davidson, Kosky, EDBT 92]

    • Meta-model has aggregation & generalization only

    • Do a union and collapse objects having the same name

    • Fix-up step for inconsistencies created by merging

X

X

a

a

Y

Z

  • Successive fixups lead to different results 

  • Batch them at the end, to produce a unique minimal result

  • Now enrich the meta-model (containment, complex mappings) & merge semantics (conflicts, deletes)


  • A formal semantics for model mgt
    A Formal Semantics for Model Mgt

    • Use category theory for a data-model-independent characterization of models and mappings

    • Models and their DBs are categories

    • Model and data transformations are morphisms

    • Mappings between models & data are functors

    • Utility

      • Define formal semantics for Match and Merge

      • Explain when Match & Merge preserve constraints.

      • Check that implementation satisfies the semantics


    Theory

    Match

    Categories

    f

    Db

    Schm

    Db(Sch1)

    Sch1

    p

    g

    Functor

    q

    Sch12

    Sch2

    Db

    Db

    Merge

    Db(Sch12)

    Db(Sch2)

    Theory

    • Goal – a mathematical semantics of MM algebra


    Implementation vision

    • Editors

    • Catalogs

    Update

    Marketing

    cust

    Model-Driven

    UI Generator

    emp

    Authorize

    Credit

    Order

    Entry

    dept

    dno

    Bill

    Customer

    dna

    Schedule

    Delivery

    select all

    Inventory

    Compose

    Match

    Copy

    Merge

    Apply

    MM Meta-Model

    Inferencing

    Engine

    OR Mapper

        

    Implementation Vision

    Operation

    Speciali-

    zations

    Model

    Manager

    Object-Oriented

    Repository

    SQL

    DBMS


    Related work
    Related Work

    • There’s a lot of it. Apply it to model management!

    • Platforms – OODBs, datalog, deductive OODBs (Telos/ConceptBase, F-Logic)

    • Inferencing on mappings – AQUV, description logic

    • Transitive closure and recursive QP

    • Differencing – text, trees, graphs

    • Data translation – algebras, schema evolution

    • Data integration – schema match, view generation


    Summary
    Summary

    • Raise the level of abstraction of meta-data programming by using:

      • models and mappings as objects

      • an algebra that manipulates models and mappings on a generic meta-model

    • Classical meta data problems can be expressed using this algebra

    • Implementations of classic problems offer guidance on implementing the algebra


    References
    References

    • http://www.research.microsoft.com/~philbe

    • P. Bernstein & E. Rahm, “Data Warehouse Scenarios for Model Management”, ER 2000 Conference

    • P. Bernstein, A. Levy, R. Pottinger, “A Vision for Manage-ment of Complex Models”, SIGMOD Record, Dec. 2000

    • E. Rahm, P. Bernstein, “On Matching Schemas Automatically,” MSR Tech Report

    • J. Madhavan, P. Bernstein, E. Rahm, “Generic Schema Matching with Cupid”, VLDB 2001

    • S. Alagić, P. Bernstein, “A Model Theory for Generic Schema Management”, DBPL 2001


    ad