slide1
Download
Skip this Video
Download Presentation
指導老師:陳彥良教授 許秉瑜教授 報告人 :楊詠喬 龍晶珠

Loading in 2 Seconds...

play fullscreen
1 / 29

Web People Search - PowerPoint PPT Presentation


  • 396 Views
  • Uploaded on

Web People Search via Connection Analysis Dmitri V. Kalashnikov, Zhaoqi (Stella) Chen, Sharad Mehrotra, Member, IEEE, and Rabia Nuray-Turan IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL.20, NO.11, NOVEMBER 2008. 指導老師:陳彥良教授 許秉瑜教授 報告人 :楊詠喬 龍晶珠. Introduction (1/2).

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Web People Search ' - LionelDale


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Web People Search via Connection AnalysisDmitri V. Kalashnikov, Zhaoqi (Stella) Chen, Sharad Mehrotra, Member, IEEE, and Rabia Nuray-TuranIEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING,VOL.20, NO.11, NOVEMBER 2008

指導老師:陳彥良教授 許秉瑜教授

報告人 :楊詠喬 龍晶珠

introduction 1 2
Introduction (1/2)
  • 現今的網路搜尋中,人物搜尋活動佔了5%以上。
  • Google 或Yahoo等搜尋引擎,依人名為關鍵字做搜尋,會回傳一連串名字相同的人的網頁資料。
  • 下一代的搜尋引擎在尋人時,將利用群集(clustering)的方法,使尋人更為簡易 。
introduction 2 2
Introduction (2/2)

本論文

1.一個新的、具有高品質分群結果的

網路人物搜尋法

2.本研究方法的完整實証評估

3.本研究方法所帶來的影響

outlines
Outlines
  • Overview of the approach
  • Generating a graph representation
  • Disambiguation algorithm
  • Interpreting clustering results
  • Related works
  • Experimental results
  • Conclusions and future work
overview of the approach i 3
Overview of the approach(I/3)
  • User input
  • Web page retrieval

retrieves a fixed number (top K) of relevant pages

  • Preprocessing:

-- compute TF/IDF

-- extraction of Named entities (NEs)

and Web-related information

overview of the approach 2 3
Overview of the approach(2/3)
  • Graph creation

the entity-relationship (ER) graph

  • Clustering
  • Cluster processing

(1)sketch

(2)cluster ranking

(3)web page ranking

disambiguation algorithm 1 5
Disambiguation algorithm(1/5)
  • CC(Correlation Clustering)

focus on developing and learning a new

accurate s(u,v)

  • Connection Strength(c(u,v))

c(u,v) can help designing a better similarity function s(u,v)

disambiguation algorithm 2 5
Disambiguation algorithm(2/5)
  • Similarity Function(s(u,v))

s(u,v) lebals data with the threshold τ and the δ-band approach

disambiguation algorithm 3 5
Disambiguation algorithm(3/5)
  • TF/IDF

-用來計算 feature-based similarity f(u,v)

  • Between two documents u,v
disambiguation algorithm 4 5
Disambiguation algorithm(4/5)
  • For each (u,v) edge,we should require that

Adding slack

disambiguation algorithm 5 5
Disambiguation algorithm(5/5)
  • Choosing negative weight

The value of w-( )is chosen to be zero when is less than a certain threshold, and it is chosen to be 1 when it is above this threshold. The value for this threshold itself is learned from the data.

interpreting clustering results
Interpreting clustering results
  • Cluster rank
  • Cluster sketch
  • Web page rank

The remainder pages are displayed in the order of the affinity to the selected cluster.

related work 1 2
Related work(1/2)
  • Disambguation
related work 2 2
Related work(2/2)

Web people serch

1.server-side setting

2.middleware approach (ˇ)

experimental results
Experimental results
  • 1. Experimental setup
  • 2. testing disambiguation quality
  • 3. impact on search
  • 4. efficiency
experimental setup
Experimental setup
  • Data sets( Leave-one-out cross validation) :
    • 1. www 2005 data set
    • 2. WEPS data set
    • 3. Context data set
  • Quality evaluation measures
    • B-cubed , Fp
  • Baseline methods
    • Agglomerative Vector Space Clustering
  • Statistical significance test
    • t-test
slide20

Testing disambiguation quality—Experiment 1 (disambiguation quality : overall)

slide21

Testing disambiguation quality— Experiment 2 (disambiguation quality :group identification)

slide22

Testing disambiguation quality— Experiment 3 (disambiguation quality :queries with context)

impact on search measures experiment 5
Impact on search—measures(experiment 5 )
  • First-dominant cluster Regular cluster
efficiency experiment 6
Efficiency experiment 6

1.由於透過第三者 (NE extractor, GATE) 摘錄NEs, 一開始的下載及前處理,每個網頁需要用3.82秒。

2.假如用 server-side approach, 前處理過程就可以離線事先做好。

3.集群演算法本身執行時,平均每個名字花4.7秒。

future work
Future work
  • Employ external data sources for disambiguation as well
  • Use more advances extraction capabilities
  • a better interpretation of extracted entities by taking into account the roles they play with respect to each other
  • Develop disambiguation algorithms for other people search problems that have different settings
  • A algorithms for a generic entity search
ad