generic model management a database infrastructure for schema manipulation
Download
Skip this Video
Download Presentation
Generic Model Management A Database Infrastructure for Schema Manipulation

Loading in 2 Seconds...

play fullscreen
1 / 33

Generic Model Management A Database Infrastructure for Schema Manipulation - PowerPoint PPT Presentation


  • 91 Views
  • Uploaded on

Generic Model Management A Database Infrastructure for Schema Manipulation. Philip A. Bernstein Microsoft Corporation September 6, 2001. The Problem. There is 30 years of DB Research on meta data But we don’t have great infrastructure to offer

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Generic Model Management A Database Infrastructure for Schema Manipulation' - rivka


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
generic model management a database infrastructure for schema manipulation

Generic Model ManagementA Database Infrastructure for Schema Manipulation

Philip A. Bernstein

Microsoft Corporation

September 6, 2001

the problem
The Problem
  • There is 30 years of DB Research on meta data
  • But we don’t have great infrastructure to offer
    • Most design tools and web services store meta data in files, not DBs
    • OODBMS’s are not a huge success
    • Most meta data driven tools use their own infrastructure
  • Goal: generic meta data manipulation infrastructure
    • Reduce the amount of programming required to build meta data driven applications.
  • Proposal: Model Management
    • Define an algebra to manipulate meta data in large chunks, called models and mappings.
outline
Outline
  • Overview of Model Management
  • Solutions to classical meta data problems
  • Recent technical results
models and mappings
Models and Mappings
  • Model – a complex information structure
    • XML schema, SQL schema, OO interface, UML model, web site map, make script, ….
  • Mapping – a transformation from one model into another
    • Map between two XML schemas
    • Map a SQL schema to an XML schema
    • Map data sources to a data warehouse
    • Map an ER diagram to a SQL schema
    • Map a process defn to a workflow script
representation
Relational

Schema

XSD

map1

Emp

Emp

E#

E#

Dept#

Dept#

Name

Name

First

A mapping is a model each of whose nodes connects nodes of two other models

Last

Representation

A model is a directed graph with one root.

model management algebra
Model Management Algebra
  • Enumerate
  • ApplyFunction
  • Copy
  • Update operations
  • Match
  • Merge
  • Compose
  • Select
  • Diff
map match m 1 m 2
map1

=

=

map = Match(M1, M2, )
  • Match(M1, M2, ) returns the best mapping between M1and M2, w.r.t. to 

M2

Emp

M1

Emp

E#

E#

Dept#

Dept#

Phone

Name

Name

Addr

First

Last

m 3 merge m 1 m 2 map

Emp

Emp

Emp

mapC

Addr

Name

Phone

Name

Phone

Addr

Name

=

M3 = Merge(M1, M2, map)
  • Return the union of models M1 and M2
    • Use map to guide the Merge
    • If elements x = y in map, then collapse them into one element
left composition f
Emp

Emp

mapC

mapC = mapA f• mapB

Addr

c1

StAddr

Street

c2

Town

c3

City

Left Composition ( f •)

Emp

Emp

Emp

mapB

mapA

Addr

Name

Name

a1

b1

Street

Street

StAddr

b2

a2

City

Town

b3

a3

City

M2

M1

M3

model management algebra1
Model Management Algebra
  • list = Enumerate(M)
  • ApplyFunction(M, f )
  • M2= Copy(M1)
  • Update operations
  • map = Match (M1, M2, )
  • M3 = Merge (M1, M2, map)
  • map3 = Compose(map1, map2)
  • M2= Select(M1, pred)
  • M2= Diff(M1, map)

They’re generic = data model independent

… well … implemented on an extended ER model with an extensibility story

example
xsd1

1. map2

2. map3

1. map2= Match(xsd1, xsd2)

3. map4

map1

rdb1

2. map3 = map1map2

rdb2

Example
  • Given
    • map1 from SQL schema rdb1 to xsd1,
    • xsd2, which is similar to xsd1
  • Produce
    • a map between xsd2 and a relational schema.

xsd2

3. = Copy(map3)

4. Use ApplyFunction(map4) to map each x in Diff(xsd2,map4) into rdb2

theme
Theme
  • Classic meta data problems can be solved using Model Management operations
    • Schema integration
    • Schema evolution
    • Data migration
    • Reverse engineering
    • Data reintegration (3-way merging)
  • Published solutions to these problems help us produce generic implementations of model mgmt operations
outline1
Outline
  • Overview of Model Management
  • Solutions to classical meta data problems
    • Schema integration
    • Schema evolution
    • Reverse engineering
    • Data reintegration (3-way merging)
    • Data migration
  • Recent technical results
schema integration

S

S

1. map= Match(V1, V2)

map

2. S = Merge(V1, V2 , map)

2.

3. ApplyFunction(S) // to resolve conflicts in S, producing S

1.map

V2

V1

Schema Integration
  • Given
    • two view schemas, V1 and V2
  • Produce
    • an integrated schema, S
slide15
map

f

=

L

Name

FirstName

=

R

LastName

S

Emp

E#

1. map= Match(V1, V2)

FirstName

2. S = Merge(V1, V2 , map)

Dept#

LastName

3. Use ApplyFunction(S) to re-solve conflicts, producing S

Addr

Phone

V1

V2

Emp

Emp

E#

E#

Dept#

Dept#

Phone

Addr

FirstName

Name

LastName

merging knowledge bases ontologies
Merging Knowledge Bases (Ontologies)
  • Same as schema integration, but applied to ontologies
  • The literature on merging ontologies focuses mostly on Match.
schema evolution
2. mapSV

V

1. mapSS= Match(S, S)

mapSV

2. mapS V = mapS SmapSV

S

S

1. mapSS

Schema Evolution
  • Given
    • mapSV from schema S to view V
    • a modified version S of S
  • Produce
    • a mapping mapSV from S to V (i.e. a view defn for V over S).
  • Use ApplyFunction(V) to delete elements not derivable from S
outline2
Outline
  • Overview of Model Management
  • Solutions to classical meta data problems
    • Schema integration
    • Schema evolution
    • Reverse engineering
    • Data reintegration (3-way merging)
    • Data migration
  • Recent technical results
reverse engineering
M

M

1. mapGG= Match(G, G)

2. mapMG = mapMG  mapGG

2. mapMG

mapMG

3.mapMG

3. = Copy(mapMG)

1. mapGG

G

G

Reverse Engineering
  • Given
    • Model M (e.g., an ER model)
    • Model G (e.g., SQL) generated via mapMG from M
    • A modified version G of G
  • Produce
    • A modified version M of M that generates G

4. Use ApplyFunction(mapMG), to reverse engineer each g in Diff(G,mapMG) into M

3 way merge aka reintegration
S0
  • MapOA = Match(O, A) (based on OIDs)
  • MapOB = Match (O, B) (based on OIDs)
  • MapOA = ApplyFunction(MapOA) such that if eMapOA if domain(e) = range(e), then delete e  (i.e. things changed in A)
  • MapOB = ApplyFunction(MapOB) such that if eMapOB if domain(e) = range(e), then delete e (i.e. things changed in B)
  • ChangedA = range(MapOA)
  • ChangedB = range(MapOB)
  • MapChAChB = Match(ChangedA, ChangedB)
  • MapChBChA = invert(MapChAChB)
  • A = Diff(ChangedA,  ChangedB, MapChAChB) (changed in A but not changed in B)
  • B = Diff(ChangedB, ChangedA, MapChBChA)
  • MapAB =  Match (A,B) (by OIDs)
  • G = Merge (A,B, MapAB)
  • MapGA =Match(G,A)
  • GA = Merge (G, A, MapGA) with preference for A
  • MapGAB =Match(GA,B)
  • GAB = Merge (GA’, B’, MapGA’B’) with preference for B
  • DeletedA = Diff(O,A,MapOA)
  • DeletedB = Diff(O, B, MapOB)
  • MapDeletedAChangedB = Match(DeletedA, ChangedB)
  • MapDeletedBChangedA = Match(DeletedB, ChangedA)
  • ShouldDeleteA = Diff(DeletedA, ChangedB, MapDeletedAChangedB)
  • ShouldDeleteB = Diff(DeletedB, ChangedA, MapDeletedBChangedA)
  • MapGABSDA = Match(GAB, ShouldDeleteA)
  • GABSDA = Diff(GAB, ShouldDeleteA, MapGABSDA)
  • MapGABSDASDB = Match(GABSDA,ShouldDeleteB)
  • Final result = Diff(GABSDA, ShouldDeleteB, MapGABSDASDB)

S1

S2

S3

3-Way Merge (aka Reintegration)
  • Given
    • a source schema S0
    • two derived schemas S1 and S2
  • Produce
    • a schema S3 that merges the changes of S1 and S2
data migration
D

Enum

Generate

Migration

Script

1. mapSS= Match(S, S)

Run

2. Use Enum(S) to generate a data migration script

1. mapSS

D

S

S

Data Migration
  • Given
    • a schema S and its database D
    • an evolved schema S
  • Produce
    • a procedure for mapping D into an S database D
data translation
Data Translation
  • Like data migration, except S and S are expressed in different data models.
outline3
Outline
  • Overview of Model Management
  • Solutions to classical meta data problems
  • Recent technical results
status report
Status Report
  • Vision
    • [Bernstein, Halevy, & Pottinger, SIGMOD Record 12/00]
  • Data Warehouse Examples
    • [Bernstein & Rahm, ER ’00]
  • Match Operation
    • Survey: [Rahm & Bernstein, MSR Tech Report]
    • Prototype: [Madhavan, Bernstein, & Rahm, VLDB ’01]
  • Merge Operation
    • coming soon …
  • Theory
    • [Alagić & Bernstein, DBPL ’01]
schema matching approaches
Individual matchers

Combined matchers

Schema-based

Content-based

Hybrid

Composite

Per-Element

Structural

Per-Element

Manual composition

Automatic composition

Constraint-based

Constraint-based

Constraint-based

Linguistic

Linguistic

  • Names
  • Descriptions
  • Types
  • Keys
  • Graph matching
  • IR (word frequencies, key terms)
  • Value pattern and ranges
Schema Matching Approaches
  • About a dozen published algorithms.
  • Many good ideas, but none are robust.
the cupid algorithm
PurchaseOrder

PO

DeliverTo

InvoiceTo

POShipTo

POBillTo

Address

Address

City

City

Street

Street

City

City

Street

Street

The CUPID Algorithm
  • Computes linguistic similarity of element pairs
  • Computes structural similarity of element pairs
  • Generates a mapping

ssim++

m 3 merge m 1 map m 2
X

X

Y

Z

a

a

a

Y

W

Z

M3 = Merge(M1, map, M2)
  • [Buneman, Davidson, Kosky, EDBT 92]
    • Meta-model has aggregation & generalization only
    • Do a union and collapse objects having the same name
    • Fix-up step for inconsistencies created by merging

X

X

a

a

Y

Z

    • Successive fixups lead to different results 
    • Batch them at the end, to produce a unique minimal result
  • Now enrich the meta-model (containment, complex mappings) & merge semantics (conflicts, deletes)
a formal semantics for model mgt
A Formal Semantics for Model Mgt
  • Use category theory for a data-model-independent characterization of models and mappings
  • Models and their DBs are categories
  • Model and data transformations are morphisms
  • Mappings between models & data are functors
  • Utility
    • Define formal semantics for Match and Merge
    • Explain when Match & Merge preserve constraints.
    • Check that implementation satisfies the semantics
theory
Match

Categories

f

Db

Schm

Db(Sch1)

Sch1

p

g

Functor

q

Sch12

Sch2

Db

Db

Merge

Db(Sch12)

Db(Sch2)

Theory
  • Goal – a mathematical semantics of MM algebra
implementation vision
Generic Tools
  • Browser
  • Import/export
  • Scripting
  • Editors
  • Catalogs

Update

Marketing

cust

Model-Driven

UI Generator

emp

Authorize

Credit

Order

Entry

dept

dno

Bill

Customer

dna

Schedule

Delivery

select all

Inventory

Compose

Match

Copy

Merge

Apply

MM Meta-Model

Inferencing

Engine

OR Mapper

    

Implementation Vision

Operation

Speciali-

zations

Model

Manager

Object-Oriented

Repository

SQL

DBMS

related work
Related Work
  • There’s a lot of it. Apply it to model management!
  • Platforms – OODBs, datalog, deductive OODBs (Telos/ConceptBase, F-Logic)
  • Inferencing on mappings – AQUV, description logic
  • Transitive closure and recursive QP
  • Differencing – text, trees, graphs
  • Data translation – algebras, schema evolution
  • Data integration – schema match, view generation
summary
Summary
  • Raise the level of abstraction of meta-data programming by using:
    • models and mappings as objects
    • an algebra that manipulates models and mappings on a generic meta-model
  • Classical meta data problems can be expressed using this algebra
  • Implementations of classic problems offer guidance on implementing the algebra
references
References
  • http://www.research.microsoft.com/~philbe
  • P. Bernstein & E. Rahm, “Data Warehouse Scenarios for Model Management”, ER 2000 Conference
  • P. Bernstein, A. Levy, R. Pottinger, “A Vision for Manage-ment of Complex Models”, SIGMOD Record, Dec. 2000
  • E. Rahm, P. Bernstein, “On Matching Schemas Automatically,” MSR Tech Report
  • J. Madhavan, P. Bernstein, E. Rahm, “Generic Schema Matching with Cupid”, VLDB 2001
  • S. Alagić, P. Bernstein, “A Model Theory for Generic Schema Management”, DBPL 2001
ad