web search clustering and labeling with hidden topics n.
Download
Skip this Video
Download Presentation
Web Search Clustering and Labeling with Hidden Topics

Loading in 2 Seconds...

play fullscreen
1 / 26

Web Search Clustering and Labeling with Hidden Topics - PowerPoint PPT Presentation


  • 51 Views
  • Uploaded on

Web Search Clustering and Labeling with Hidden Topics. Presenter : Chien-Hsing Chen Author: Cam- Tu Nguyen Xuan-Hieu Phan Susumu Horiguchi Thu- Trang Nguyen Quang-Thuy Ha. 2009.TALIP.40 . Outline. Motivation Objective Method Experiments Conclusion

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Web Search Clustering and Labeling with Hidden Topics' - willow


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
web search clustering and labeling with hidden topics

Web Search Clustering and Labeling withHidden Topics

Presenter:Chien-Hsing Chen

Author: Cam-Tu Nguyen

Xuan-HieuPhan

Susumu Horiguchi

Thu-Trang Nguyen

Quang-Thuy Ha

2009.TALIP.40.

slide2

Outline

  • Motivation
  • Objective
  • Method
  • Experiments
  • Conclusion
  • Comment
slide3

Motivation

  • d1:
  • ezPeer+ 音樂下載、音樂試聽、歌詞、MP3、音樂網- 蔡依林- 歷年專輯
  • ezPeer+ – 蔡依林- J1 Live Concert演唱會影音全紀錄,J-game,看我72變,城堡,J9 Party 派對精選,JolinJ-
  • Top 冠軍精選,舞孃,蔡依林唯舞獨尊演唱會鮮聽版& 混音專輯&花...web.ezpeer.com/singer/s120.html - 頁庫存檔 - 類似內容
  • d2:
  • ezPeer+ 音樂下載、音樂試
  • 花蝴蝶好聽…
  • web.ezpeer.com/singer/s120.html - 頁庫存檔 - 類似內容
  • The snippets are usually noisier, less topic-focused, and much shorter
    • 花??
  • similarity evaluation between snippets may not be successful

d3: {He is an author}

d4: {The writer is standing behind you}

slide4

Objective

  • Similarity evaluator is referred to a set of hidden topics
  • di: {He is an author}
  • dj: {The writer is standing behind you}
    • (a document may be related to multi-topics)
slide5

music

movie

Framework

music

movie

radio player

dj

di

di > topic10

dj > topic10

(label candidate generation)

slide6

cul.

hel.

politics

edu.

LDA

entertainment

In training step:

the keyword is related to a topic when it often occurs in the documents topic

show business

zm,n

refer to topic k

k topic

m document

n word

z1

z2

z3

wm,n

refer to vocabulary

w1

w2

w3

k = 10

(show business)

K=60

the word “music” in the topic 10 can explain the occurrence of the words in the documents m=1,2,3

slide7

LDA

k topic

m document

n word

zm,n

z1

wm,n

k = topic 10

K=60

w1

slide8

LDA

dm

k topic

m document

n word

p(.|.)=?

zm,n

z1

wm,n

k = topic 10

K=60

w1

slide9

LDA

p(.|.)=1/60

dm

k topic

m document

n word

p(.|.)=?

zm,n

z1

wm,n

k = topic 10

K=60

w1

slide11

Similarity between di and dj

  • the tth term in the vocabulary V
  • the kth topic
slide12

Framework

similarity matrix between snippets

slide15

music

movie

Framework

music

movie

radio layer

dj

di

di > topic10

dj > topic4, topic10

(label candidate generation)

slide16

Experiment

Wikipedia dataset

Vnexpress dataset

slide17

Experimental dataset

Web dataset consists of 2,357 snippets in 9 categories

20 queries to Google and obtaining about 150 distinguished snippets

slide18

Experiments

  • F-measure
slide25

Conclusion

  • clustering snippets with hidden topics
  • labeling clusters using hidden topic analysis
slide26

My Comment

  • Advantage
    • labeling clusters with the help of hidden topics
    • the size of snippets is small
      • Two datasets: 2,357 and 150
      • (in our work: more than 2 million snippets)
  • Disadvantage
    • less depends on snippets
  • Application
    • snippets are useful to make sense