Muse: A System for Understanding and Designing Mappings

Step 1: Probing on the cid attribute Example source: Companies 11 IBM NY 12 IBM NY Projects P1 DB 11 e4 P2 Web 12 e5 Employees e4 John x234 e5 Anna x888 Target instances: Scenario 1: OrgDB Orgs IBM Projects:SK(11,y) DB e4 IBM Projects:SK(12,y) Web e5 Employees e4 John e5 Anna Scenario 2: OrgDB Orgs IBM Projects:SK(y) DB e4 Web e5 Employees e4 John e5 Anna y subset of {IBM,NY} The designer chooses scenario 2 (excludes cid from the grouping function) Muse: A System for Understanding and Designing Mappings Bogdan Alexe Laura Chiticariu Renée J. Miller Daniel Pepper Wang-Chiew Tan UC Santa Cruz U. of Toronto UC Santa Cruz Motivation Muse Overview • Muse is a mapping design wizard that uses data examples to help designers understand, design and refine schema mappings • In Muse, the designer works with data examples rather than with complex specifications to understand the semantics of a mapping • Muse uses real data examples whenever possible, otherwise it constructs synthetic examples • Muse consists of two components: Muse-G (design of desired nesting semantics for mappings) and Muse-D (choosing the desired interpretation of ambiguous mappings) • Schema mapping = relationship between a source database schema and a target database schema • Designing a schema mapping is a fundamental problem in information integration • Specifying a semantically correct schema mapping is usually a complex task • Automatic tools can suggest potential mappings • Ensuring mapping correctness still requires intricate manual work • Few tools are available for helping a designer understand and design alternative mappings Designing Nesting Semantics with Muse-G • Nesting semantics are expressed through grouping functions, which are defined for each nested set in the target schema • A grouping function is a form of Skolem function, with atomic attributes as parameters • Example grouping function from mapping m2 • SKProjs(<…all attributes of c, p and e …>) : target Project records are grouped according to the values of all attributes of the Company, Project and Employee source records Step 2: Probing on the cname attribute OrgDB: Rcd Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd pname manager Employees: Set of Employee: Rcd eid ename CompDB: Rcd Companies: Set of Company: Rcd cid cname location Projects: Set of Project: Rcd pid pname cid manager Employees: Set of Employee: Rcd eid ename contact Scenario 2: OrgDB Orgs IBM Projects:SK(y) DB e4 WiFi e6 SBC Projects:SK(y) DB e4 WiFi e6 Employees e4 John e6 Kat Example source: Companies 11 IBM NY 14 SBC NY Projects P1 DB 11 e4 P4 WiFi 14 e6 Employees e4 John x234 e6 Kat x331 Target instances: Scenario 1: OrgDB Orgs IBM Projects:SK(IBM,y) DB e4 SBC Projects:SK(SBC,y) WiFi e6 Employees e4 John e6 Kat f1 • Example: Designing the grouping function for the target Projects set • Suppose the set of possible arguments is S = {cid, cname, location} • Muse-G probes every attribute in S • At each probe, a small carefully chosen source instance is considered, from which two differentiating target instances are obtained: one includes the probed attribute in the grouping function (Scenario 1 below), and the other omits it (Scenario 2 below). f2 y subset of {NY} The designer chooses scenario 1 (includes cname in the grouping function) Step 3: Probing on the location attribute m1: forcin CompDB.Companies existsoin OrgDB.Orgs wherec.cname=o.oname and o.Projects = SKProjs(c.cid,c.cname,c.location) m2: forcin CompDB.Companies, pin CompDB.Projects, ein CompDB.Employees satisfyp.cid=c.cid ande.eid=p.manager existsoin OrgDB.Orgs, p1ino.Projects, e1in OrgDB.Employees satisfyp1.manager=e1.eid wherec.cname=o.oname ande.eid=e1.eid ande.ename=e1.ename andp.pname=p1.pname and o.Projects = SKProjs(<…all attributes of c, p and e …>) m3: forein CompDB.Employees existse1in OrgDB.Employees wheree.eid = e1.eid ande.ename=e1.ename Example source: Companies 11 IBM NY 13 IBM SF Projects P1 DB 11 e4 P2 Web 13 e5 Employees e4 John x234 e5 Anna x888 Target instances: Scenario 1: OrgDB Orgs IBM Projects:SK(IBM,NY) DB e4 IBM Projects:SK(IBM,SF) Web e5 Employees e4 John e5 Anna Scenario 2: OrgDB Orgs IBM Projects:SK(IBM) DB e4 Web e5 Employees e4 John e5 Anna The designer chooses scenario 2 (excludes location from the grouping function) Conclusion: the desired grouping function for Projects is SK(cname) Choosing Desired Mapping Interpretation with Muse-D Extensions CompDB: Rcd Projects: Set of Project: Rcd pid pname manager tech-lead Employees: Set of Employee: Rcd eid ename contact OrgDB: Rcd Projects: Set of Project: Rcd pname supervisor email • The mapping scenario on the left is ambiguous: it can be interpreted in several ways • e.g. the project supervisor can be either the manager or the tech-lead • In total, there are four alternative interpretations • Key idea of Muse-D: provide an example source instance to illustrate the four interpretations in a compact way • Muse-G can take advantage of constraints on the source schema (such as keys, and more generally, functional dependencies) • The designer can refine the desired nesting semantics incrementally Example source: Projects P1 DB e4 e5 Employees e4 John john@ibm e5 Anna anna@ibm Target instance: Orgs: Projects: DB John john@ibm Anna anna@ibm Ambiguous mapping: ma: forpin CompDB.Projects, e1in CompDB.Employees, e2in CompDB.Employees satisfye1.eid=p.manager and e2.eid=p.tech-lead existsp1in OrgDB.Projects wherep.pname=p1.pname and (e1.ename=p1.supervisor ore2.ename=p1.supervisor) and (e1.contact=p1.email ore2.contact=p1.email) Choice values for supervisor and email (the designer makes one selection for each attribute)

Muse: A System for Understanding and Designing Mappings

Muse: A System for Understanding and Designing Mappings

Presentation Transcript

Debugging Schema Mappings with Routes

User interface design

IDSF New Judging System

Understanding Cancer

User interface design

Understanding the USEPA’s AERMOD Modeling System for Environmental Managers

Understanding Operating Systems Sixth Edition

Designing Effective Classrooms

Understanding Operating Systems Sixth Edition

Bullying: From Understanding to Action

B1 Understanding Ourselves

Understanding by Design

Features, Code Mappings and User Access

Designing Out Waste

Creating semantic mappings

Drugs Affecting the Respiratory System

Understanding, N o tation, and Dimension of Matrix Types of Matrix

Chapter 2. Analytic Functions

Number System

Designing Schools to Support Teachers’ Ongoing Learning

Stroke: Understanding Management for EMS