Using ontological relationships to provide indexing of plain t ext searches
This presentation is the property of its rightful owner.
Sponsored Links
1 / 12

Using Ontological Relationships to Provide Indexing of Plain T ext Searches PowerPoint PPT Presentation


  • 51 Views
  • Uploaded on
  • Presentation posted in: General

Using Ontological Relationships to Provide Indexing of Plain T ext Searches. Research by Fletcher Liverance [email protected] November 14 th , 2011. How Does a Search Engine Work?. 1. User submits a keyword based query to the search engine.

Download Presentation

Using Ontological Relationships to Provide Indexing of Plain T ext Searches

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Using ontological relationships to provide indexing of plain t ext searches

Using Ontological Relationships to Provide Indexing of Plain Text Searches

Research by Fletcher Liverance

[email protected]

November 14th, 2011


How does a search engine work

How Does a Search Engine Work?

1. User submits a keyword based query to the search engine

4. Pages are ranked and returned to the user

2. The indexer locates all relevant pages containing those keywords

3. The database returns all pages found in the index


How does a search engine work1

How Does a Search Engine Work?

Benefits

  • Fast

  • Machine learnable

  • Straight forward

    Drawbacks

  • Pattern matching

  • Keyword based

  • Garbage in, garbage out


Garbage in garbage out

Garbage in, Garbage out

Scenario

You saw this television series and you’d like to find out more about it, but you don’t know what the name of the series or any of the characters are.

What do you do?

http://www.dan-dare.org/FreeFun/Images/CartoonsMoviesTV/WinnieThePoohWallpaper1024.jpg


Garbage in garbage out1

Garbage in, Garbage out

POOR RESULTS!


Garbage in garbage out2

Garbage in, Garbage out

GOOD RESULTS!


Semantic relationships

Semantic Relationships

  • Ontology

    “An ontology is a description (like a formal specification of a program) of the concepts and relationships that can exist for an agent or a community of agents.”http://www-ksl.stanford.edu/kst/what-is-an-ontology.html

  • Resource Description Framework (RDF)

    “RDF extends the linking structure of the Web to use URIs to name the relationship between things as well as the two ends of the link. Using this simple model, it allows structured and semi-structured data to be mixed, exposed, and shared across different applications.”

    http://www.w3.org/RDF/

Disney

Winnie the Pooh

Bear

isMadeBy

isA

hasFriend

hasClothing

hasColor

Piglet

Shirt

Yellow

hasColor

isA

Pig

Red


Semantic relationships1

Semantic Relationships

How can we locate useful semantic relationships?

  • Link Distance

  • Link Direction

  • Link Relationship

Bear

Disney

hasColor

isA

isA

isMadeBy

isA

Company

Brown

Winnie the Pooh

Mammal

hasFriend

hasClothing

hasColor

Piglet

Shirt

Yellow

hasColor

isA

hasRGB

Pig

Red

0xFFFF00


Modified search indexing

Modified Search Indexing

1. User submits a keyword based query to the search engine

4. Searches are ranked and returned to the user as additional search suggestions

2. Search analyzer creates additional searches based on ontological information

3. Search engine performs parallel searches of top search terms


Current work

Current Work

  • NASA SWEET Ontologies

    • 6000 concepts

    • 200 ontologies

    • Scientific

    • Loose relationships

  • National Oceanographic and Atmospheric Administration

    • 30+ years of scientific research

    • Text based

    • Unsorted

    • 2+ gigabytes

    • Domain specific terminology


Challenges future work

Challenges & Future Work

  • How to rank plain text

    • No links or history

    • No ‘page views’

  • Limited ontology coverage

    • 6000 concepts in NASA SWEET ontologies

    • ~170,000 words in the English language

    • Many more unique names and scientific terms

    • How can ontologies be automatically generated?

  • Graph matching

    • Identifying related terms in a large graph is difficult

    • Multiple links per node, must identify appropriate links


Using ontological relationships to provide indexing of plain t ext searches

Q & A


  • Login