10 likes | 83 Views
Step 1: Probing on the cid attribute. Example source: Companies 11 IBM NY 12 IBM NY Projects P1 DB 11 e4 P2 Web 12 e5 Employees e4 John x234 e5 Anna x888. Target instances: Scenario 1: OrgDB Orgs IBM Projects :SK(11, y ) DB e4 IBM
E N D
Step 1: Probing on the cid attribute Example source: Companies 11 IBM NY 12 IBM NY Projects P1 DB 11 e4 P2 Web 12 e5 Employees e4 John x234 e5 Anna x888 Target instances: Scenario 1: OrgDB Orgs IBM Projects:SK(11,y) DB e4 IBM Projects:SK(12,y) Web e5 Employees e4 John e5 Anna Scenario 2: OrgDB Orgs IBM Projects:SK(y) DB e4 Web e5 Employees e4 John e5 Anna y subset of {IBM,NY} The designer chooses scenario 2 (excludes cid from the grouping function) Muse: A System for Understanding and Designing Mappings Bogdan Alexe Laura Chiticariu Renée J. Miller Daniel Pepper Wang-Chiew Tan UC Santa Cruz U. of Toronto UC Santa Cruz Motivation Muse Overview • Muse is a mapping design wizard that uses data examples to help designers understand, design and refine schema mappings • In Muse, the designer works with data examples rather than with complex specifications to understand the semantics of a mapping • Muse uses real data examples whenever possible, otherwise it constructs synthetic examples • Muse consists of two components: Muse-G (design of desired nesting semantics for mappings) and Muse-D (choosing the desired interpretation of ambiguous mappings) • Schema mapping = relationship between a source database schema and a target database schema • Designing a schema mapping is a fundamental problem in information integration • Specifying a semantically correct schema mapping is usually a complex task • Automatic tools can suggest potential mappings • Ensuring mapping correctness still requires intricate manual work • Few tools are available for helping a designer understand and design alternative mappings Designing Nesting Semantics with Muse-G • Nesting semantics are expressed through grouping functions, which are defined for each nested set in the target schema • A grouping function is a form of Skolem function, with atomic attributes as parameters • Example grouping function from mapping m2 • SKProjs(<…all attributes of c, p and e …>) : target Project records are grouped according to the values of all attributes of the Company, Project and Employee source records Step 2: Probing on the cname attribute OrgDB: Rcd Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd pname manager Employees: Set of Employee: Rcd eid ename CompDB: Rcd Companies: Set of Company: Rcd cid cname location Projects: Set of Project: Rcd pid pname cid manager Employees: Set of Employee: Rcd eid ename contact Scenario 2: OrgDB Orgs IBM Projects:SK(y) DB e4 WiFi e6 SBC Projects:SK(y) DB e4 WiFi e6 Employees e4 John e6 Kat Example source: Companies 11 IBM NY 14 SBC NY Projects P1 DB 11 e4 P4 WiFi 14 e6 Employees e4 John x234 e6 Kat x331 Target instances: Scenario 1: OrgDB Orgs IBM Projects:SK(IBM,y) DB e4 SBC Projects:SK(SBC,y) WiFi e6 Employees e4 John e6 Kat f1 • Example: Designing the grouping function for the target Projects set • Suppose the set of possible arguments is S = {cid, cname, location} • Muse-G probes every attribute in S • At each probe, a small carefully chosen source instance is considered, from which two differentiating target instances are obtained: one includes the probed attribute in the grouping function (Scenario 1 below), and the other omits it (Scenario 2 below). f2 y subset of {NY} The designer chooses scenario 1 (includes cname in the grouping function) Step 3: Probing on the location attribute m1: forcin CompDB.Companies existsoin OrgDB.Orgs wherec.cname=o.oname and o.Projects = SKProjs(c.cid,c.cname,c.location) m2: forcin CompDB.Companies, pin CompDB.Projects, ein CompDB.Employees satisfyp.cid=c.cid ande.eid=p.manager existsoin OrgDB.Orgs, p1ino.Projects, e1in OrgDB.Employees satisfyp1.manager=e1.eid wherec.cname=o.oname ande.eid=e1.eid ande.ename=e1.ename andp.pname=p1.pname and o.Projects = SKProjs(<…all attributes of c, p and e …>) m3: forein CompDB.Employees existse1in OrgDB.Employees wheree.eid = e1.eid ande.ename=e1.ename Example source: Companies 11 IBM NY 13 IBM SF Projects P1 DB 11 e4 P2 Web 13 e5 Employees e4 John x234 e5 Anna x888 Target instances: Scenario 1: OrgDB Orgs IBM Projects:SK(IBM,NY) DB e4 IBM Projects:SK(IBM,SF) Web e5 Employees e4 John e5 Anna Scenario 2: OrgDB Orgs IBM Projects:SK(IBM) DB e4 Web e5 Employees e4 John e5 Anna The designer chooses scenario 2 (excludes location from the grouping function) Conclusion: the desired grouping function for Projects is SK(cname) Choosing Desired Mapping Interpretation with Muse-D Extensions CompDB: Rcd Projects: Set of Project: Rcd pid pname manager tech-lead Employees: Set of Employee: Rcd eid ename contact OrgDB: Rcd Projects: Set of Project: Rcd pname supervisor email • The mapping scenario on the left is ambiguous: it can be interpreted in several ways • e.g. the project supervisor can be either the manager or the tech-lead • In total, there are four alternative interpretations • Key idea of Muse-D: provide an example source instance to illustrate the four interpretations in a compact way • Muse-G can take advantage of constraints on the source schema (such as keys, and more generally, functional dependencies) • The designer can refine the desired nesting semantics incrementally Example source: Projects P1 DB e4 e5 Employees e4 John john@ibm e5 Anna anna@ibm Target instance: Orgs: Projects: DB John john@ibm Anna anna@ibm Ambiguous mapping: ma: forpin CompDB.Projects, e1in CompDB.Employees, e2in CompDB.Employees satisfye1.eid=p.manager and e2.eid=p.tech-lead existsp1in OrgDB.Projects wherep.pname=p1.pname and (e1.ename=p1.supervisor ore2.ename=p1.supervisor) and (e1.contact=p1.email ore2.contact=p1.email) Choice values for supervisor and email (the designer makes one selection for each attribute)