web search clustering and labeling with hidden topics
Download
Skip this Video
Download Presentation
Web Search Clustering and Labeling with Hidden Topics

Loading in 2 Seconds...

play fullscreen
1 / 26

Web Search Clustering and Labeling with Hidden Topics - PowerPoint PPT Presentation


  • 50 Views
  • Uploaded on

Web Search Clustering and Labeling with Hidden Topics. Presenter : Chien-Hsing Chen Author: Cam- Tu Nguyen Xuan-Hieu Phan Susumu Horiguchi Thu- Trang Nguyen Quang-Thuy Ha. 2009.TALIP.40 . Outline. Motivation Objective Method Experiments Conclusion

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Web Search Clustering and Labeling with Hidden Topics' - willow


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
web search clustering and labeling with hidden topics

Web Search Clustering and Labeling withHidden Topics

Presenter:Chien-Hsing Chen

Author: Cam-Tu Nguyen

Xuan-HieuPhan

Susumu Horiguchi

Thu-Trang Nguyen

Quang-Thuy Ha

2009.TALIP.40.

slide2

Outline

  • Motivation
  • Objective
  • Method
  • Experiments
  • Conclusion
  • Comment
slide3

Motivation

  • d1:
  • ezPeer+ 音樂下載、音樂試聽、歌詞、MP3、音樂網- 蔡依林- 歷年專輯
  • ezPeer+ – 蔡依林- J1 Live Concert演唱會影音全紀錄,J-game,看我72變,城堡,J9 Party 派對精選,JolinJ-
  • Top 冠軍精選,舞孃,蔡依林唯舞獨尊演唱會鮮聽版& 混音專輯&花...web.ezpeer.com/singer/s120.html - 頁庫存檔 - 類似內容
  • d2:
  • ezPeer+ 音樂下載、音樂試
  • 花蝴蝶好聽…
  • web.ezpeer.com/singer/s120.html - 頁庫存檔 - 類似內容
  • The snippets are usually noisier, less topic-focused, and much shorter
    • 花??
  • similarity evaluation between snippets may not be successful

d3: {He is an author}

d4: {The writer is standing behind you}

slide4

Objective

  • Similarity evaluator is referred to a set of hidden topics
  • di: {He is an author}
  • dj: {The writer is standing behind you}
    • (a document may be related to multi-topics)
slide5

music

movie

Framework

music

movie

radio player

dj

di

di > topic10

dj > topic10

(label candidate generation)

slide6

cul.

hel.

politics

edu.

LDA

entertainment

In training step:

the keyword is related to a topic when it often occurs in the documents topic

show business

zm,n

refer to topic k

k topic

m document

n word

z1

z2

z3

wm,n

refer to vocabulary

w1

w2

w3

k = 10

(show business)

K=60

the word “music” in the topic 10 can explain the occurrence of the words in the documents m=1,2,3

slide7

LDA

k topic

m document

n word

zm,n

z1

wm,n

k = topic 10

K=60

w1

slide8

LDA

dm

k topic

m document

n word

p(.|.)=?

zm,n

z1

wm,n

k = topic 10

K=60

w1

slide9

LDA

p(.|.)=1/60

dm

k topic

m document

n word

p(.|.)=?

zm,n

z1

wm,n

k = topic 10

K=60

w1

slide11

Similarity between di and dj

  • the tth term in the vocabulary V
  • the kth topic
slide12

Framework

similarity matrix between snippets

slide15

music

movie

Framework

music

movie

radio layer

dj

di

di > topic10

dj > topic4, topic10

(label candidate generation)

slide16

Experiment

Wikipedia dataset

Vnexpress dataset

slide17

Experimental dataset

Web dataset consists of 2,357 snippets in 9 categories

20 queries to Google and obtaining about 150 distinguished snippets

slide18

Experiments

  • F-measure
slide25

Conclusion

  • clustering snippets with hidden topics
  • labeling clusters using hidden topic analysis
slide26

My Comment

  • Advantage
    • labeling clusters with the help of hidden topics
    • the size of snippets is small
      • Two datasets: 2,357 and 150
      • (in our work: more than 2 million snippets)
  • Disadvantage
    • less depends on snippets
  • Application
    • snippets are useful to make sense
ad