1 / 33

Generic Model Management A Database Infrastructure for Schema Manipulation

Generic Model Management A Database Infrastructure for Schema Manipulation. Philip A. Bernstein Microsoft Corporation September 6, 2001. The Problem. There is 30 years of DB Research on meta data But we don’t have great infrastructure to offer

rivka
Download Presentation

Generic Model Management A Database Infrastructure for Schema Manipulation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Generic Model ManagementA Database Infrastructure for Schema Manipulation Philip A. Bernstein Microsoft Corporation September 6, 2001

  2. The Problem • There is 30 years of DB Research on meta data • But we don’t have great infrastructure to offer • Most design tools and web services store meta data in files, not DBs • OODBMS’s are not a huge success • Most meta data driven tools use their own infrastructure • Goal: generic meta data manipulation infrastructure • Reduce the amount of programming required to build meta data driven applications. • Proposal: Model Management • Define an algebra to manipulate meta data in large chunks, called models and mappings.

  3. Outline • Overview of Model Management • Solutions to classical meta data problems • Recent technical results

  4. Models and Mappings • Model – a complex information structure • XML schema, SQL schema, OO interface, UML model, web site map, make script, …. • Mapping – a transformation from one model into another • Map between two XML schemas • Map a SQL schema to an XML schema • Map data sources to a data warehouse • Map an ER diagram to a SQL schema • Map a process defn to a workflow script

  5. Relational Schema XSD map1 Emp Emp E# E# Dept# Dept# Name Name First A mapping is a model each of whose nodes connects nodes of two other models Last Representation A model is a directed graph with one root.

  6. Model Management Algebra • Enumerate • ApplyFunction • Copy • Update operations • Match • Merge • Compose • Select • Diff

  7. map1 = =  map = Match(M1, M2, ) • Match(M1, M2, ) returns the best mapping between M1and M2, w.r.t. to  M2 Emp M1 Emp E# E# Dept# Dept# Phone Name Name Addr First Last

  8. Emp Emp Emp mapC Addr Name Phone Name Phone Addr Name = M3 = Merge(M1, M2, map) • Return the union of models M1 and M2 • Use map to guide the Merge • If elements x = y in map, then collapse them into one element

  9. Emp Emp mapC mapC = mapA f• mapB Addr c1 StAddr Street c2 Town c3 City Left Composition ( f •) Emp Emp Emp mapB mapA Addr Name Name a1 b1 Street Street StAddr b2 a2 City Town b3 a3 City M2 M1 M3

  10. Model Management Algebra • list = Enumerate(M) • ApplyFunction(M, f ) • M2= Copy(M1) • Update operations • map = Match (M1, M2, ) • M3 = Merge (M1, M2, map) • map3 = Compose(map1, map2) • M2= Select(M1, pred) • M2= Diff(M1, map) They’re generic = data model independent … well … implemented on an extended ER model with an extensibility story

  11. xsd1 1. map2 2. map3 1. map2= Match(xsd1, xsd2) 3. map4 map1 rdb1 2. map3 = map1map2 rdb2 Example • Given • map1 from SQL schema rdb1 to xsd1, • xsd2, which is similar to xsd1 • Produce • a map between xsd2 and a relational schema. xsd2 3. <map4, rdb2 > = Copy(map3) 4. Use ApplyFunction(map4) to map each x in Diff(xsd2,map4) into rdb2

  12. Theme • Classic meta data problems can be solved using Model Management operations • Schema integration • Schema evolution • Data migration • Reverse engineering • Data reintegration (3-way merging) • Published solutions to these problems help us produce generic implementations of model mgmt operations

  13. Outline • Overview of Model Management • Solutions to classical meta data problems • Schema integration • Schema evolution • Reverse engineering • Data reintegration (3-way merging) • Data migration • Recent technical results

  14. S S 1. map= Match(V1, V2) map 2. S = Merge(V1, V2 , map) 2. 3. ApplyFunction(S) // to resolve conflicts in S, producing S 1.map V2 V1 Schema Integration • Given • two view schemas, V1 and V2 • Produce • an integrated schema, S

  15. map f = L Name FirstName = R LastName  S Emp E# 1. map= Match(V1, V2) FirstName 2. S = Merge(V1, V2 , map) Dept# LastName 3. Use ApplyFunction(S) to re-solve conflicts, producing S Addr Phone V1 V2 Emp Emp E# E# Dept# Dept# Phone Addr FirstName Name LastName

  16. Merging Knowledge Bases (Ontologies) • Same as schema integration, but applied to ontologies • The literature on merging ontologies focuses mostly on Match.

  17. 2. mapSV V 1. mapSS= Match(S, S) mapSV 2. mapS V = mapS SmapSV S S 1. mapSS Schema Evolution • Given • mapSV from schema S to view V • a modified version S of S • Produce • a mapping mapSV from S to V (i.e. a view defn for V over S). • Use ApplyFunction(V) to delete elements not derivable from S

  18. Outline • Overview of Model Management • Solutions to classical meta data problems • Schema integration • Schema evolution • Reverse engineering • Data reintegration (3-way merging) • Data migration • Recent technical results

  19. M M 1. mapGG= Match(G, G) 2. mapMG = mapMG  mapGG 2. mapMG mapMG 3.mapMG 3. <M, mapG M > = Copy(mapMG) 1. mapGG G G Reverse Engineering • Given • Model M (e.g., an ER model) • Model G (e.g., SQL) generated via mapMG from M • A modified version G of G • Produce • A modified version M of M that generates G 4. Use ApplyFunction(mapMG), to reverse engineer each g in Diff(G,mapMG) into M

  20. S0 • MapOA = Match(O, A) (based on OIDs) • MapOB = Match (O, B) (based on OIDs) • MapOA = ApplyFunction(MapOA) such that if eMapOA if domain(e) = range(e), then delete e  (i.e. things changed in A) • MapOB = ApplyFunction(MapOB) such that if eMapOB if domain(e) = range(e), then delete e (i.e. things changed in B) • ChangedA = range(MapOA) • ChangedB = range(MapOB) • MapChAChB = Match(ChangedA, ChangedB) • MapChBChA = invert(MapChAChB) • A = Diff(ChangedA,  ChangedB, MapChAChB) (changed in A but not changed in B) • B = Diff(ChangedB, ChangedA, MapChBChA) • MapAB =  Match (A,B) (by OIDs) • G = Merge (A,B, MapAB) • MapGA =Match(G,A) • GA = Merge (G, A, MapGA) with preference for A • MapGAB =Match(GA,B) • GAB = Merge (GA’, B’, MapGA’B’) with preference for B • DeletedA = Diff(O,A,MapOA) • DeletedB = Diff(O, B, MapOB) • MapDeletedAChangedB = Match(DeletedA, ChangedB) • MapDeletedBChangedA = Match(DeletedB, ChangedA) • ShouldDeleteA = Diff(DeletedA, ChangedB, MapDeletedAChangedB) • ShouldDeleteB = Diff(DeletedB, ChangedA, MapDeletedBChangedA) • MapGABSDA = Match(GAB, ShouldDeleteA) • GABSDA = Diff(GAB, ShouldDeleteA, MapGABSDA) • MapGABSDASDB = Match(GABSDA,ShouldDeleteB) • Final result = Diff(GABSDA, ShouldDeleteB, MapGABSDASDB) S1 S2 S3 3-Way Merge (aka Reintegration) • Given • a source schema S0 • two derived schemas S1 and S2 • Produce • a schema S3 that merges the changes of S1 and S2

  21. D Enum Generate Migration Script 1. mapSS= Match(S, S) Run 2. Use Enum(S) to generate a data migration script 1. mapSS D S S Data Migration • Given • a schema S and its database D • an evolved schema S • Produce • a procedure for mapping D into an S database D

  22. Data Translation • Like data migration, except S and S are expressed in different data models.

  23. Outline • Overview of Model Management • Solutions to classical meta data problems • Recent technical results

  24. Status Report • Vision • [Bernstein, Halevy, & Pottinger, SIGMOD Record 12/00] • Data Warehouse Examples • [Bernstein & Rahm, ER ’00] • Match Operation • Survey: [Rahm & Bernstein, MSR Tech Report] • Prototype: [Madhavan, Bernstein, & Rahm, VLDB ’01] • Merge Operation • coming soon … • Theory • [Alagić & Bernstein, DBPL ’01]

  25. Individual matchers Combined matchers Schema-based Content-based Hybrid Composite Per-Element Structural Per-Element Manual composition Automatic composition Constraint-based Constraint-based Constraint-based Linguistic Linguistic • Names • Descriptions • Types • Keys • Graph matching • IR (word frequencies, key terms) • Value pattern and ranges Schema Matching Approaches • About a dozen published algorithms. • Many good ideas, but none are robust.

  26. PurchaseOrder PO DeliverTo InvoiceTo POShipTo POBillTo Address Address City City Street Street City City Street Street The CUPID Algorithm • Computes linguistic similarity of element pairs • Computes structural similarity of element pairs • Generates a mapping ssim++

  27. X X Y Z a a a Y W Z M3 = Merge(M1, map, M2) • [Buneman, Davidson, Kosky, EDBT 92] • Meta-model has aggregation & generalization only • Do a union and collapse objects having the same name • Fix-up step for inconsistencies created by merging X X a a Y Z • Successive fixups lead to different results  • Batch them at the end, to produce a unique minimal result • Now enrich the meta-model (containment, complex mappings) & merge semantics (conflicts, deletes)

  28. A Formal Semantics for Model Mgt • Use category theory for a data-model-independent characterization of models and mappings • Models and their DBs are categories • Model and data transformations are morphisms • Mappings between models & data are functors • Utility • Define formal semantics for Match and Merge • Explain when Match & Merge preserve constraints. • Check that implementation satisfies the semantics

  29. Match Categories f Db Schm Db(Sch1) Sch1 p g Functor q Sch12 Sch2 Db Db Merge Db(Sch12) Db(Sch2) Theory • Goal – a mathematical semantics of MM algebra

  30. Generic Tools • Browser • Import/export • Scripting • Editors • Catalogs Update Marketing cust Model-Driven UI Generator emp Authorize Credit Order Entry dept dno Bill Customer dna Schedule Delivery select all Inventory Compose Match Copy Merge … Apply MM Meta-Model Inferencing Engine OR Mapper         Implementation Vision Operation Speciali- zations Model Manager Object-Oriented Repository SQL DBMS

  31. Related Work • There’s a lot of it. Apply it to model management! • Platforms – OODBs, datalog, deductive OODBs (Telos/ConceptBase, F-Logic) • Inferencing on mappings – AQUV, description logic • Transitive closure and recursive QP • Differencing – text, trees, graphs • Data translation – algebras, schema evolution • Data integration – schema match, view generation

  32. Summary • Raise the level of abstraction of meta-data programming by using: • models and mappings as objects • an algebra that manipulates models and mappings on a generic meta-model • Classical meta data problems can be expressed using this algebra • Implementations of classic problems offer guidance on implementing the algebra

  33. References • http://www.research.microsoft.com/~philbe • P. Bernstein & E. Rahm, “Data Warehouse Scenarios for Model Management”, ER 2000 Conference • P. Bernstein, A. Levy, R. Pottinger, “A Vision for Manage-ment of Complex Models”, SIGMOD Record, Dec. 2000 • E. Rahm, P. Bernstein, “On Matching Schemas Automatically,” MSR Tech Report • J. Madhavan, P. Bernstein, E. Rahm, “Generic Schema Matching with Cupid”, VLDB 2001 • S. Alagić, P. Bernstein, “A Model Theory for Generic Schema Management”, DBPL 2001

More Related