1 / 26

Entity Resolution with Evolving Rules

Entity Resolution with Evolving Rules. Youzhong Ma 2010-9-25 Lab of WAMDM. Outline. Motivations ER Related concepts ER properties Conclusions. Entity Resolution background. Entity Resolution background. Naïve ER Approach Vs. New Approach. Outline. Motivations ER Related concepts

zandra
Download Presentation

Entity Resolution with Evolving Rules

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Entity Resolution with Evolving Rules Youzhong Ma 2010-9-25 Lab of WAMDM

  2. Outline • Motivations • ER Related concepts • ER properties • Conclusions

  3. Entity Resolution background

  4. Entity Resolution background

  5. Naïve ER Approach Vs. New Approach

  6. Outline • Motivations • ER Related concepts • ER properties • Conclusions

  7. ER Related concepts • Suppose market A will merge market B • They have to combine their customers • The same person may occur in two markets’ customer DB, but some attributes are different • How to deal with it?

  8. ER Rule • Boolean functions • determines if two records represent the same entity: true or false. • Distance functions • How different(similar) the records are.

  9. ER Example

  10. ER procedure The Evolving rule approach only works if the ER algorithm satisfies Certain properties and B2 is Stricter than B1. So one contribution of this paper is to exploit Under what conditions and for what ER algorithms Are incremental approaches feasible? original records set S = {r1,r2,r3,r4} ER input Pi = {{r1},{r2},{r3},{r4}} B1:Pname  E1 = {{r1,r2,r3},{r4}} (6 comps) 6 comps Naïve approach 3 comps Evolving rule B2: Pname ∧ Pzip  E2 = {{r1,r2},{r3},{r4}}

  11. Materialization! original records set S = {r1,r2,r3,r4} ER input Pi = {{r1},{r2},{r3},{r4}} B1:Pname ∧ Pzip  E1 = {{r1,r2},{r3},{r4}} (6 comps) Pname  Ename = {{r1,r2,r3},{r4}} Pzip  Ezip = {{r1,r2},{r3},{r4}} 3comps B2: Pname ∧ Phone  E2 ={{r1},{r2,r3},{r4}}

  12. Outline • Motivations • ER Related concepts • ER properties • Conclusions

  13. Two important properties for ER algorithms that enable efficient rule evolution for match-based clustering • Rule Monotonicity(RM) • Context Free(CF)

  14. Pname ∧ Pzip ≤ Pname

  15. Rule Monotonicity(RM) B1: Pname ∧ Pzip  E1 = {{r1,r2},{r3},{r4}} B2:Pname  E2 = {{r1,r2,r3},{r4}}

  16. Context Free (CF)

  17. Existing properties in literature • General Incremental VS. Context Free • Order independent VS. Rule Monotonicity • An ER algorithm is order independent if the ER result is same regardless of the order of the records processed.

  18. experiments

  19. Outline • Motivations • ER Related concepts • ER properties • Conclusions

  20. conclusions • Propose a new ER approach with evolving rules • Exploiting the properties (RM、CF) of the ER algorithms that enable efficient rule evolution • Providing guidance to the ER algorithms designer

  21. Some problems • How are the comparision rules generated? • How to design the ER Algorithms that hold the RM and CFproperties? • How to Implement the ER algorithms in MapReduce framework?

  22. Thanks to everyone of Web Group sincerely

  23. Thank You !

More Related