Web People Search
Download
1 / 29

???????????????? ??? ???????? - PowerPoint PPT Presentation


  • 397 Views
  • Updated On :

Web People Search via Connection Analysis Dmitri V. Kalashnikov, Zhaoqi (Stella) Chen, Sharad Mehrotra, Member, IEEE, and Rabia Nuray-Turan IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL.20, NO.11, NOVEMBER 2008. 指導老師:陳彥良教授 許秉瑜教授 報告人 :楊詠喬 龍晶珠. Introduction (1/2).

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about '???????????????? ??? ????????' - LionelDale


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Slide1 l.jpg

Web People Search via Connection AnalysisDmitri V. Kalashnikov, Zhaoqi (Stella) Chen, Sharad Mehrotra, Member, IEEE, and Rabia Nuray-TuranIEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING,VOL.20, NO.11, NOVEMBER 2008

指導老師:陳彥良教授 許秉瑜教授

報告人 :楊詠喬 龍晶珠


Introduction 1 2 l.jpg
Introduction (1/2)

  • 現今的網路搜尋中,人物搜尋活動佔了5%以上。

  • Google 或Yahoo等搜尋引擎,依人名為關鍵字做搜尋,會回傳一連串名字相同的人的網頁資料。

  • 下一代的搜尋引擎在尋人時,將利用群集(clustering)的方法,使尋人更為簡易 。


Introduction 2 2 l.jpg
Introduction (2/2)

本論文

1.一個新的、具有高品質分群結果的

網路人物搜尋法

2.本研究方法的完整實証評估

3.本研究方法所帶來的影響


Outlines l.jpg
Outlines

  • Overview of the approach

  • Generating a graph representation

  • Disambiguation algorithm

  • Interpreting clustering results

  • Related works

  • Experimental results

  • Conclusions and future work


Overview of the approach i 3 l.jpg
Overview of the approach(I/3)

  • User input

  • Web page retrieval

    retrieves a fixed number (top K) of relevant pages

  • Preprocessing:

    -- compute TF/IDF

    -- extraction of Named entities (NEs)

    and Web-related information


Overview of the approach 2 3 l.jpg
Overview of the approach(2/3)

  • Graph creation

    the entity-relationship (ER) graph

  • Clustering

  • Cluster processing

    (1)sketch

    (2)cluster ranking

    (3)web page ranking


Slide7 l.jpg

Overview of the approach(3/3)




Disambiguation algorithm 1 5 l.jpg
Disambiguation algorithm(1/5)

  • CC(Correlation Clustering)

    focus on developing and learning a new

    accurate s(u,v)

  • Connection Strength(c(u,v))

    c(u,v) can help designing a better similarity function s(u,v)


Disambiguation algorithm 2 5 l.jpg
Disambiguation algorithm(2/5)

  • Similarity Function(s(u,v))

    s(u,v) lebals data with the threshold τ and the δ-band approach


Disambiguation algorithm 3 5 l.jpg
Disambiguation algorithm(3/5)

  • TF/IDF

    -用來計算 feature-based similarity f(u,v)

  • Between two documents u,v


Disambiguation algorithm 4 5 l.jpg
Disambiguation algorithm(4/5)

  • For each (u,v) edge,we should require that

    Adding slack


Disambiguation algorithm 5 5 l.jpg
Disambiguation algorithm(5/5)

  • Choosing negative weight

    The value of w-( )is chosen to be zero when is less than a certain threshold, and it is chosen to be 1 when it is above this threshold. The value for this threshold itself is learned from the data.


Interpreting clustering results l.jpg
Interpreting clustering results

  • Cluster rank

  • Cluster sketch

  • Web page rank

    The remainder pages are displayed in the order of the affinity to the selected cluster.


Related work 1 2 l.jpg
Related work(1/2)

  • Disambguation


Related work 2 2 l.jpg
Related work(2/2)

Web people serch

1.server-side setting

2.middleware approach (ˇ)


Experimental results l.jpg
Experimental results

  • 1. Experimental setup

  • 2. testing disambiguation quality

  • 3. impact on search

  • 4. efficiency


Experimental setup l.jpg
Experimental setup

  • Data sets( Leave-one-out cross validation) :

    • 1. www 2005 data set

    • 2. WEPS data set

    • 3. Context data set

  • Quality evaluation measures

    • B-cubed , Fp

  • Baseline methods

    • Agglomerative Vector Space Clustering

  • Statistical significance test

    • t-test


Slide20 l.jpg

Testing disambiguation quality—Experiment 1 (disambiguation quality : overall)


Slide21 l.jpg

Testing disambiguation quality— Experiment 2 (disambiguation quality :group identification)


Slide22 l.jpg

Testing disambiguation quality— Experiment 3 (disambiguation quality :queries with context)


Slide23 l.jpg

Testing disambiguation quality— Experiment 4 ( quality of generating cluster sketches)


Impact on search measures experiment 5 l.jpg
Impact on search—measures(experiment 5 )

  • First-dominant cluster Regular cluster


Impact on search measures experiment 525 l.jpg
Impact on search—measures(experiment 5 )

  • average


Impact on search with context l.jpg
Impact on search—with context


Efficiency experiment 6 l.jpg
Efficiency experiment 6

1.由於透過第三者 (NE extractor, GATE) 摘錄NEs, 一開始的下載及前處理,每個網頁需要用3.82秒。

2.假如用 server-side approach, 前處理過程就可以離線事先做好。

3.集群演算法本身執行時,平均每個名字花4.7秒。


Future work l.jpg
Future work

  • Employ external data sources for disambiguation as well

  • Use more advances extraction capabilities

  • a better interpretation of extracted entities by taking into account the roles they play with respect to each other

  • Develop disambiguation algorithms for other people search problems that have different settings

  • A algorithms for a generic entity search