Web search clustering and labeling with hidden topics
Download
1 / 26

Web Search Clustering and Labeling with Hidden Topics - PowerPoint PPT Presentation


  • 49 Views
  • Uploaded on

Web Search Clustering and Labeling with Hidden Topics. Presenter : Chien-Hsing Chen Author: Cam- Tu Nguyen Xuan-Hieu Phan Susumu Horiguchi Thu- Trang Nguyen Quang-Thuy Ha. 2009.TALIP.40 . Outline. Motivation Objective Method Experiments Conclusion

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Web Search Clustering and Labeling with Hidden Topics' - willow


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Web search clustering and labeling with hidden topics

Web Search Clustering and Labeling withHidden Topics

Presenter:Chien-Hsing Chen

Author: Cam-Tu Nguyen

Xuan-HieuPhan

Susumu Horiguchi

Thu-Trang Nguyen

Quang-Thuy Ha

2009.TALIP.40.


Outline

  • Motivation

  • Objective

  • Method

  • Experiments

  • Conclusion

  • Comment


Motivation

  • d1:

  • ezPeer+ 音樂下載、音樂試聽、歌詞、MP3、音樂網- 蔡依林- 歷年專輯

  • ezPeer+ – 蔡依林- J1 Live Concert演唱會影音全紀錄,J-game,看我72變,城堡,J9 Party 派對精選,JolinJ-

  • Top 冠軍精選,舞孃,蔡依林唯舞獨尊演唱會鮮聽版& 混音專輯&花...web.ezpeer.com/singer/s120.html - 頁庫存檔 - 類似內容

  • d2:

  • ezPeer+ 音樂下載、音樂試

  • 花蝴蝶好聽…

  • web.ezpeer.com/singer/s120.html - 頁庫存檔 - 類似內容

  • The snippets are usually noisier, less topic-focused, and much shorter

    • 花??

  • similarity evaluation between snippets may not be successful

d3: {He is an author}

d4: {The writer is standing behind you}


Objective

  • Similarity evaluator is referred to a set of hidden topics

  • di: {He is an author}

  • dj: {The writer is standing behind you}

    • (a document may be related to multi-topics)


music

movie

Framework

music

movie

radio player

dj

di

di > topic10

dj > topic10

(label candidate generation)


cul.

hel.

politics

edu.

LDA

entertainment

In training step:

the keyword is related to a topic when it often occurs in the documents topic

show business

zm,n

refer to topic k

k topic

m document

n word

z1

z2

z3

wm,n

refer to vocabulary

w1

w2

w3

k = 10

(show business)

K=60

the word “music” in the topic 10 can explain the occurrence of the words in the documents m=1,2,3


LDA

k topic

m document

n word

zm,n

z1

wm,n

k = topic 10

K=60

w1


LDA

dm

k topic

m document

n word

p(.|.)=?

zm,n

z1

wm,n

k = topic 10

K=60

w1


LDA

p(.|.)=1/60

dm

k topic

m document

n word

p(.|.)=?

zm,n

z1

wm,n

k = topic 10

K=60

w1



Similarity between di and dj

  • the tth term in the vocabulary V

  • the kth topic


Framework

similarity matrix between snippets




music

movie

Framework

music

movie

radio layer

dj

di

di > topic10

dj > topic4, topic10

(label candidate generation)


Experiment

Wikipedia dataset

Vnexpress dataset


Experimental dataset

Web dataset consists of 2,357 snippets in 9 categories

20 queries to Google and obtaining about 150 distinguished snippets


Experiments

  • F-measure








Conclusion

  • clustering snippets with hidden topics

  • labeling clusters using hidden topic analysis


My Comment

  • Advantage

    • labeling clusters with the help of hidden topics

    • the size of snippets is small

      • Two datasets: 2,357 and 150

      • (in our work: more than 2 million snippets)

  • Disadvantage

    • less depends on snippets

  • Application

    • snippets are useful to make sense


ad