1 / 20

Top-K Generation of Integrated Schemas Based on Directed and Weighted Correspondences

Top-K Generation of Integrated Schemas Based on Directed and Weighted Correspondences by Ahmed Radwan, Lucian Popa, Ioana R. Stanoi, Akmal Younis Presented by Prasanna Kunchavaram (800690762) ‏ ITCS 6265 3 rd November 2009.

elijah
Download Presentation

Top-K Generation of Integrated Schemas Based on Directed and Weighted Correspondences

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Top-K Generation of Integrated Schemas Based on Directed and Weighted Correspondences by Ahmed Radwan, Lucian Popa, Ioana R. Stanoi, Akmal Younis Presented by Prasanna Kunchavaram (800690762)‏ ITCS 6265 3rd November 2009

  2. schema integration – is the process of unification of heterogeneous data sources to obtain a single non redundant, consistent data source schema integration - the process of combining local schemas into a global, integrated schema Examples : Combining data bases/ tables due to a merger or acquisition of companies. Combining two products into one and resulting combination of historic sales data. Creation of new table for employees using employment data and medical records. Introduction

  3. Correspondence- is the matching between the elements of heterogeneous schemas. There is a weight and direction associated to each correspondence based on the confidence of the matches. For example weight of correspondence between A to B might be different from the weight of correspondence between B to A. Previous Schema integration tools provide interactive option for the users to select a desired integrated schema using the surviving correspondences. Introduction (continued)‏

  4. Previous schema integration techniques Do not consider direction and weight of correspondence. Need user interaction for a final integration decision. Laborious process of selecting easy as well as difficult integration. Time consuming. Resource intensive. Problem

  5. Example : Weighted and directed correspondence Options for schema integration Problem (continued)‏

  6. Relationships in the integrated schema are defined using direction and weight of correspondence between elements. Such relationships are ranked based on priority of similarity and coverage to produce top k schemas. Easy integrations are adopted without user interaction. For difficult integrations user is provided an option to select constraints on the schemas involved. System generates revised top k schemas that satisfy constraints. Steps 4 and 5 are repeated till final schema is obtained. Solution

  7. Example of Easy integration (Integration without user interaction)‏

  8. A concept is a relation name associated with a set of attributes in a schema. Correspondences between schemas are expressed using Concept graph. A concept graph is a pair (V, has) where V is a set of concepts and has is a set of directed and labeled edges between concepts. Correspondence of concepts across schemas is defined by the pair of weights (in both directions). Considering pair (C1,C2) of concepts, where C1 is from schema S1 and C2 is from schema S2. The weight of the directed correspondence C1 → C2, can be denoted by ˆs(C1,C2). The weight of the directed correspondence C2 → C1, can be denoted by ˆs(C2, C1). Correspondence of concepts across S1 and S2 is defined by pair [ˆs(C1,C2),ˆs(C2, C1)] Concept and Concept graph

  9. Example- Concept graph

  10. AnassignmentA is a fixed-sized, ordered vector of bits where each bit X represents the state of one correspondence, value 1 representing a correspondence and value 0 representing an absence of correspondence. Set of assignments are ranked to get the top K assignments. For each assignment with value 1 the concepts involved in the respective correspondence should be combined. There are two ways by which concepts can be combined based on the weight and direction of the similarity. The two methods are mergeandhas A threshold λ is used for deciding which method is to be used for the combination Assignment

  11. Example of λ effect on integration decision

  12. Algorithm

  13. And n is total number of non-zero correspondences Example Assignment with weights Cost function (used to rank assignments)‏

  14. Calculate ^Si and ^Di and assign 1 for correspondences where ^Si > ^Di and 0 where ^Si < ^Di. The result is the optimal assignment for k=1. Next k-1 best assignments is based on decision to flip the bits of assignment vector. Let the vector Δfbe the difference between ^Si and ^Di. calculate Δf to quantify the cost impact of flipping the bit i from its current value in the assignmentA1. For each i, Δf represents the increase in cost with respect to cost(A1) if the bit i in A1 were to be flipped. Sort Δf in increasing order and denote as Δfs. Find the next assignment that minimizes the increase in cost. Now the 2nd best assignment can be obtained by flipping bit Xi that gives the least cost increase. Next compute the 3rd best assignment , we need to change the variable with the next cost increase and leave Xi unflipped. If there are two choices, select the choice that gives smaller cost increase. Other assignments are calculated likewise. Top K algorithm

  15. Top K Algorithm- Example

  16. As stated before is the threshold which is used for combining concepts in an integration based on the following rules Steps to calculate λ 1. iteratively scan all the correspondences in E, where E is the set of correspondences that are selected by at least one of the top k assingments. 2. for each such correspondence, record max(ˆs1, ˆs2) and add this value to a list L, and finally 3. set λ to be the minimum of the values in L. Tuning λ

  17. Example of Schema Integration with different λ values

  18. Results

  19. Top K algorithm for schema integration that executes in polynomial time is developed. Important information like weight and direction of the correspondence are efficiently used to reduce user interaction. Easy integrations are performed by the system without any user interaction while keeping the data consistent. Results clearly state that the algorithm can be efficient in reducing user interaction and thus reducing the time taken to achieve complex schema integration. Future work includes automation (integration without user interaction) and enhancements to the algorithm to implement with couple of hundred schemas. Conclusion

  20. Questions ?

More Related