slide1 l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
指導老師:陳彥良教授 許秉瑜教授 報告人 :楊詠喬 龍晶珠 PowerPoint Presentation
Download Presentation
指導老師:陳彥良教授 許秉瑜教授 報告人 :楊詠喬 龍晶珠

Loading in 2 Seconds...

play fullscreen
1 / 29

指導老師:陳彥良教授 許秉瑜教授 報告人 :楊詠喬 龍晶珠 - PowerPoint PPT Presentation


  • 427 Views
  • Uploaded on

Web People Search via Connection Analysis Dmitri V. Kalashnikov, Zhaoqi (Stella) Chen, Sharad Mehrotra, Member, IEEE, and Rabia Nuray-Turan IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL.20, NO.11, NOVEMBER 2008. 指導老師:陳彥良教授 許秉瑜教授 報告人 :楊詠喬 龍晶珠. Introduction (1/2).

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

指導老師:陳彥良教授 許秉瑜教授 報告人 :楊詠喬 龍晶珠


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
    Presentation Transcript
    1. Web People Search via Connection AnalysisDmitri V. Kalashnikov, Zhaoqi (Stella) Chen, Sharad Mehrotra, Member, IEEE, and Rabia Nuray-TuranIEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING,VOL.20, NO.11, NOVEMBER 2008 指導老師:陳彥良教授 許秉瑜教授 報告人 :楊詠喬 龍晶珠

    2. Introduction (1/2) • 現今的網路搜尋中,人物搜尋活動佔了5%以上。 • Google 或Yahoo等搜尋引擎,依人名為關鍵字做搜尋,會回傳一連串名字相同的人的網頁資料。 • 下一代的搜尋引擎在尋人時,將利用群集(clustering)的方法,使尋人更為簡易 。

    3. Introduction (2/2) 本論文 1.一個新的、具有高品質分群結果的 網路人物搜尋法 2.本研究方法的完整實証評估 3.本研究方法所帶來的影響

    4. Outlines • Overview of the approach • Generating a graph representation • Disambiguation algorithm • Interpreting clustering results • Related works • Experimental results • Conclusions and future work

    5. Overview of the approach(I/3) • User input • Web page retrieval retrieves a fixed number (top K) of relevant pages • Preprocessing: -- compute TF/IDF -- extraction of Named entities (NEs) and Web-related information

    6. Overview of the approach(2/3) • Graph creation the entity-relationship (ER) graph • Clustering • Cluster processing (1)sketch (2)cluster ranking (3)web page ranking

    7. Overview of the approach(3/3)

    8. Generating a graph representation(1/2)

    9. Generating a graph representation(2/2)

    10. Disambiguation algorithm(1/5) • CC(Correlation Clustering) focus on developing and learning a new accurate s(u,v) • Connection Strength(c(u,v)) c(u,v) can help designing a better similarity function s(u,v)

    11. Disambiguation algorithm(2/5) • Similarity Function(s(u,v)) s(u,v) lebals data with the threshold τ and the δ-band approach

    12. Disambiguation algorithm(3/5) • TF/IDF -用來計算 feature-based similarity f(u,v) • Between two documents u,v

    13. Disambiguation algorithm(4/5) • For each (u,v) edge,we should require that Adding slack

    14. Disambiguation algorithm(5/5) • Choosing negative weight The value of w-( )is chosen to be zero when is less than a certain threshold, and it is chosen to be 1 when it is above this threshold. The value for this threshold itself is learned from the data.

    15. Interpreting clustering results • Cluster rank • Cluster sketch • Web page rank The remainder pages are displayed in the order of the affinity to the selected cluster.

    16. Related work(1/2) • Disambguation

    17. Related work(2/2) Web people serch 1.server-side setting 2.middleware approach (ˇ)

    18. Experimental results • 1. Experimental setup • 2. testing disambiguation quality • 3. impact on search • 4. efficiency

    19. Experimental setup • Data sets( Leave-one-out cross validation) : • 1. www 2005 data set • 2. WEPS data set • 3. Context data set • Quality evaluation measures • B-cubed , Fp • Baseline methods • Agglomerative Vector Space Clustering • Statistical significance test • t-test

    20. Testing disambiguation quality—Experiment 1 (disambiguation quality : overall)

    21. Testing disambiguation quality— Experiment 2 (disambiguation quality :group identification)

    22. Testing disambiguation quality— Experiment 3 (disambiguation quality :queries with context)

    23. Testing disambiguation quality— Experiment 4 ( quality of generating cluster sketches)

    24. Impact on search—measures(experiment 5 ) • First-dominant cluster Regular cluster

    25. Impact on search—measures(experiment 5 ) • average

    26. Impact on search—with context

    27. Efficiency experiment 6 1.由於透過第三者 (NE extractor, GATE) 摘錄NEs, 一開始的下載及前處理,每個網頁需要用3.82秒。 2.假如用 server-side approach, 前處理過程就可以離線事先做好。 3.集群演算法本身執行時,平均每個名字花4.7秒。

    28. Future work • Employ external data sources for disambiguation as well • Use more advances extraction capabilities • a better interpretation of extracted entities by taking into account the roles they play with respect to each other • Develop disambiguation algorithms for other people search problems that have different settings • A algorithms for a generic entity search

    29. Thank you for listening