1 / 1

Muse: A System for Understanding and Designing Mappings

Step 1: Probing on the cid attribute. Example source: Companies 11 IBM NY 12 IBM NY Projects P1 DB 11 e4 P2 Web 12 e5 Employees e4 John x234 e5 Anna x888. Target instances: Scenario 1: OrgDB Orgs IBM Projects :SK(11, y ) DB e4 IBM

rock
Download Presentation

Muse: A System for Understanding and Designing Mappings

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Step 1: Probing on the cid attribute Example source: Companies 11 IBM NY 12 IBM NY Projects P1 DB 11 e4 P2 Web 12 e5 Employees e4 John x234 e5 Anna x888 Target instances: Scenario 1: OrgDB Orgs IBM Projects:SK(11,y) DB e4 IBM Projects:SK(12,y) Web e5 Employees e4 John e5 Anna Scenario 2: OrgDB Orgs IBM Projects:SK(y) DB e4 Web e5 Employees e4 John e5 Anna y subset of {IBM,NY} The designer chooses scenario 2 (excludes cid from the grouping function) Muse: A System for Understanding and Designing Mappings Bogdan Alexe Laura Chiticariu Renée J. Miller Daniel Pepper Wang-Chiew Tan UC Santa Cruz U. of Toronto UC Santa Cruz Motivation Muse Overview • Muse is a mapping design wizard that uses data examples to help designers understand, design and refine schema mappings • In Muse, the designer works with data examples rather than with complex specifications to understand the semantics of a mapping • Muse uses real data examples whenever possible, otherwise it constructs synthetic examples • Muse consists of two components: Muse-G (design of desired nesting semantics for mappings) and Muse-D (choosing the desired interpretation of ambiguous mappings) • Schema mapping = relationship between a source database schema and a target database schema • Designing a schema mapping is a fundamental problem in information integration • Specifying a semantically correct schema mapping is usually a complex task • Automatic tools can suggest potential mappings • Ensuring mapping correctness still requires intricate manual work • Few tools are available for helping a designer understand and design alternative mappings Designing Nesting Semantics with Muse-G • Nesting semantics are expressed through grouping functions, which are defined for each nested set in the target schema • A grouping function is a form of Skolem function, with atomic attributes as parameters • Example grouping function from mapping m2 • SKProjs(<…all attributes of c, p and e …>) : target Project records are grouped according to the values of all attributes of the Company, Project and Employee source records Step 2: Probing on the cname attribute OrgDB: Rcd Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd pname manager Employees: Set of Employee: Rcd eid ename CompDB: Rcd Companies: Set of Company: Rcd cid cname location Projects: Set of Project: Rcd pid pname cid manager Employees: Set of Employee: Rcd eid ename contact Scenario 2: OrgDB Orgs IBM Projects:SK(y) DB e4 WiFi e6 SBC Projects:SK(y) DB e4 WiFi e6 Employees e4 John e6 Kat Example source: Companies 11 IBM NY 14 SBC NY Projects P1 DB 11 e4 P4 WiFi 14 e6 Employees e4 John x234 e6 Kat x331 Target instances: Scenario 1: OrgDB Orgs IBM Projects:SK(IBM,y) DB e4 SBC Projects:SK(SBC,y) WiFi e6 Employees e4 John e6 Kat f1 • Example: Designing the grouping function for the target Projects set • Suppose the set of possible arguments is S = {cid, cname, location} • Muse-G probes every attribute in S • At each probe, a small carefully chosen source instance is considered, from which two differentiating target instances are obtained: one includes the probed attribute in the grouping function (Scenario 1 below), and the other omits it (Scenario 2 below). f2 y subset of {NY} The designer chooses scenario 1 (includes cname in the grouping function) Step 3: Probing on the location attribute m1: forcin CompDB.Companies existsoin OrgDB.Orgs wherec.cname=o.oname and o.Projects = SKProjs(c.cid,c.cname,c.location) m2: forcin CompDB.Companies, pin CompDB.Projects, ein CompDB.Employees satisfyp.cid=c.cid ande.eid=p.manager existsoin OrgDB.Orgs, p1ino.Projects, e1in OrgDB.Employees satisfyp1.manager=e1.eid wherec.cname=o.oname ande.eid=e1.eid ande.ename=e1.ename andp.pname=p1.pname and o.Projects = SKProjs(<…all attributes of c, p and e …>) m3: forein CompDB.Employees existse1in OrgDB.Employees wheree.eid = e1.eid ande.ename=e1.ename Example source: Companies 11 IBM NY 13 IBM SF Projects P1 DB 11 e4 P2 Web 13 e5 Employees e4 John x234 e5 Anna x888 Target instances: Scenario 1: OrgDB Orgs IBM Projects:SK(IBM,NY) DB e4 IBM Projects:SK(IBM,SF) Web e5 Employees e4 John e5 Anna Scenario 2: OrgDB Orgs IBM Projects:SK(IBM) DB e4 Web e5 Employees e4 John e5 Anna The designer chooses scenario 2 (excludes location from the grouping function) Conclusion: the desired grouping function for Projects is SK(cname) Choosing Desired Mapping Interpretation with Muse-D Extensions CompDB: Rcd Projects: Set of Project: Rcd pid pname manager tech-lead Employees: Set of Employee: Rcd eid ename contact OrgDB: Rcd Projects: Set of Project: Rcd pname supervisor email • The mapping scenario on the left is ambiguous: it can be interpreted in several ways • e.g. the project supervisor can be either the manager or the tech-lead • In total, there are four alternative interpretations • Key idea of Muse-D: provide an example source instance to illustrate the four interpretations in a compact way • Muse-G can take advantage of constraints on the source schema (such as keys, and more generally, functional dependencies) • The designer can refine the desired nesting semantics incrementally Example source: Projects P1 DB e4 e5 Employees e4 John john@ibm e5 Anna anna@ibm Target instance: Orgs: Projects: DB John john@ibm Anna anna@ibm Ambiguous mapping: ma: forpin CompDB.Projects, e1in CompDB.Employees, e2in CompDB.Employees satisfye1.eid=p.manager and e2.eid=p.tech-lead existsp1in OrgDB.Projects wherep.pname=p1.pname and (e1.ename=p1.supervisor ore2.ename=p1.supervisor) and (e1.contact=p1.email ore2.contact=p1.email) Choice values for supervisor and email (the designer makes one selection for each attribute)

More Related