Exploiting Context Analysis for Combining Multiple Entity Resolution Systems. Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra. -Ramu Bandaru. Table of Contents. Introduction Related Work Problem Definition
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems
Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra
co-refer. The output of an ER system is a clustering of references, where each cluster is supposed to represent one distinct entity.
if (K is large) then high threshold
if (K is small) then low threshold
•A predictive way of selecting context features that
relies on estimating the unknown parameters of the
data being processed.
•An extensive empirical evaluation of the proposed
• Majority Voting-popular rule
• Bagging & Boosting-combine classifiers of same type.
• Stacking is to perform cross-validation on the
base-level dataset and create a meta-level dataset
from the predictions of base-level classifiers.
ri~ rj , if they refer to the same object, that is, if ori = orj.
(a) the similarity metrics chosen to compare pairs of references and (b) the clustering algorithms that
generate the final clusters based on similarities.
algorithm consists of several parts:
running base-level ER systems and loading the decisions by these systems on the edges, running the two regression classifiers to derive the context features, applying meta-level classification to predict the edge classes, and creating final clusters.
the efficiency of our algorithms.
combining multiple base-level entity resolution systems.