Computing semantic relatedness using wikipedia based explicit semantic analysis
Download
1 / 37

Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis - PowerPoint PPT Presentation


  • 135 Views
  • Uploaded on

Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis. Shuai Yuan @ Emory. Semantic relatedness. Association between (two) texts according to background knowledge. Example: “Cat” <-> “mouse” “Preparing a manuscript” <-> “writing an article”

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis' - rufina


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Computing semantic relatedness using wikipedia based explicit semantic analysis

Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis

Shuai Yuan @ Emory


Semantic relatedness
Semantic relatedness

Association between (two) texts

according to background knowledge.

  • Example:

    • “Cat” <-> “mouse”

    • “Preparing a manuscript” <-> “writing an article”

    • “Chair” <-> “Sun”


Why does it matter
Why does it matter?

Association of ideas is important for artificial intelligence.

Help to extract meaningful information from texts.


What is esa
What is ESA?

Explicit Semantic Analysis.

  • Explicitly represent the meaning of any text in terms of Wikipedia-based concepts

  • How about implicit one? “latent concepts”


Indices
Indices

  • direct index

    • Concept -> words

  • Inversed index

    • Word -> Concepts



Building semantic interpreter 2
Building Semantic Interpreter(2)

  • For a single word

    • Using TFIDF to decide weights of every concept

    • Discard insignificant associations.




Building semantic interpreter 41
Building Semantic Interpreter(4)

  • For a text fragment

TFID

V1

W1

C1:K11

C2:K12

C3:K13

V2

W2

C1:K21

C2:K22

C3:K23

V3

W3

C1:K31

C2:K32

C3:K33

C1:K1

C2:K2

C3:K3





Implementation 1
Implementation(1)

  • Using Wikipedia

Wikipedia snapshot as of March 26, 2006.

parsing the Wikipedia XML dump

2.9 GB of text in 1,187,839 articles

removing small and overly specific concepts

241,393 articles

removing stop words and rare words

389,202 distinct terms


Implementation 2
Implementation(2)

  • Using Open Directory Project (ODP, http://www.dmoz.org)

ODP snapshot as of April 2004

pruning non-English material

436MB. 400k concepts and 2.8M URLs

crawling all of its URLs

70 GB of additional textual data

removing stop words and rare words

20,700,000 distinct terms


Evaluation
Evaluation

  • The “gold standard” -- Human judgements


Human evaluating word relatedness
Human evaluating word relatedness

  • WordSimilarity-353 collection2 containing 353 word pairs.

  • Hire people (13-16 for each word pair).

  • Average to a single relatedness score for each pair.


Human evaluating doc similarity
Human evaluating doc similarity

  • 50 documents from the ABC’s news mail service.

  • Paired docs in all possible ways (how many pairs?)

  • Hire people (8-12 for each doc pair).

  • Average to a single relatedness score for each pair.

  • 1225= 50*49/2




Conclusion
Conclusion

  • Explicit Semantic Analysis is a novel approach to computing semantic relatedness of natural language texts with the aid of large scale knowledge repositories (Wikipedia and the ODP).

  • Results are good!


Thank you

Q&A

Thank you !


Agenda or summary layout
Agenda or Summary Layout

10:00am

11:00am

1:00pm

A second line of text could go here

Discussion Item One – A Placeholder for text Add a second line of text here

Discussion Item Two – A Placeholder for text Add a second line of text here

Discussion Item Four – A Placeholder for text Add a second line of text here

Discussion Item Five – A Placeholder for text Add a second line of text here

Discussion Item Three – A Placeholder for text Add a second line of text here

2:00pm

12:00pm


Main content page layout
Main Content Page Layout

Add a subtitle here

  • This text is a placeholder.

    • Here is the second level.

    • You may change this text

      • Here is the third level

      • Formatting is controlled by the slide masterand the layout pages.

        • There is a third level

          • And even a fourth level

An accent, click to edit the text inside.

An accent, click to edit the text inside.


Comparison page layout

A callout, this can be edited or deleted

Comparison Page Layout

A second line of text could go here

Comparison of Item One

Comparison of Item Two

  • This is a place holder for item one. Item one can be text, a picture, graph, table, etc.

    • Here is level two

      • Here is level three

        • Level 4

        • Level 4, you may add more text or delete this text.

  • This is a place holder for item one. Item one can be text, a picture, graph, table, etc.

    • Here is level two

      • Here is level three


A one column page layout
A One Column Page Layout

A Second line of text can go here.

  • A content placeholder. Use for text, graphics, tables and graphs. You can change this text or delete it.

    • Here is a placeholder for more text. You may delete this text

    • Here is a placeholder for more text. You may delete this text


Two picture page layout
Two Picture Page Layout

A second line of text here

  • A placeholder for text for the first picture

    • More information can be added here by changing this text.

  • A placeholder for the second picture

    • More information can be added here by changing this text.


Three picture page layout
Three Picture Page Layout

A second line of text may go here.

  • A description of the first picture. You may change this text.

  • A description of the first picture. You may change this text.

  • A description of the second picture. You may change this text.

  • A description of the second picture. You may change this text.

  • A description of the third picture. You may change this text.

  • Images from PresenterMedia.com


Table page layout
Table Page Layout

A second line of text can go here.

Here is the description of the table. You may change or delete this text as you wish.

This chart is compatible with PowerPoint 97 to 2007.

Here is a placeholder for more text and description of the chart. Changing this text will not interfere with the formatting of this template.


Line graph page layout
Line Graph Page Layout

PowerPoint 97 through 2007 Compatible

Star Burst!

Here is the description of the chart. You may change or delete this text as you wish.

This chart is compatible with PowerPoint 97 to 2007.

Here is a placeholder for more text and description of the chart. Changing this text will not interfere with the formatting of this template.


Bar graph page layout
Bar Graph Page Layout

PowerPoint 2007 Enhanced Version

A callout, this can be edited or deleted

Here is the description of the chart. You may change or delete this text as you wish.

This chart utilizes features only available with 2007.

Here is a placeholder for more text and description of the chart. Changing this text will not interfere with the formatting of this template.


Pie graph page layout
Pie Graph Page Layout

PowerPoint 2007 Enhanced Version

Here is the description of the chart. You may change or delete this text as you wish.

This chart utilizes features only available with 2007.

Here is a placeholder for more text and description of the chart. Changing this text will not interfere with the formatting of this template.


Smart art page layout
Smart Art Page Layout

PowerPoint 2007 Enhanced Version

This chart utilizes Smart Art which is feature in PowerPoint 2007. If you wish to make charts like this and don’t have PPT 2007, we have provided the graphical elements to help you build this yourself.

Here is the description of the chart. You may change or delete this text as you wish.

Here is a placeholder for more text and description of the chart. Changing this text will not interfere with the formatting of this template.


Smart art page layout1
Smart Art Page Layout

PowerPoint 2007 Enhanced Version

This chart utilizes Smart Art which is feature in PowerPoint 2007. If you wish to make charts like this and don’t have PPT 2007, we have provided the graphical elements to help you build this yourself.

Here is the description of the chart. You may change or delete this text as you wish.

Here is a placeholder for more text and description of the chart. Changing this text will not interfere with the formatting of this template.


Smart art page layout2
Smart Art Page Layout

PowerPoint 2007 Enhanced Version

This chart utilizes Smart Art which is feature in PowerPoint 2007. If you wish to make charts like this and don’t have PPT 2007, we have provided the graphical elements to help you build this yourself.

Here is the description of the chart. You may change or delete this text as you wish.


Picture page layout
Picture Page Layout

You Picture caption here. Image from PresenterMedia.com


Animation page
Animation Page

Make an Impact in your presentations by adding some themed PowerPoint animations.