uiuc people finder l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
UIUC People Finder PowerPoint Presentation
Download Presentation
UIUC People Finder

Loading in 2 Seconds...

play fullscreen
1 / 35

UIUC People Finder - PowerPoint PPT Presentation


  • 312 Views
  • Uploaded on

UIUC People Finder. Info. University of Illinois at Urbana Champaign Advanced Database Management Systems CS511 Instructor ChengXiang Zhai Sena Lee (senalee2@uiuc.edu) Heewon Jung (hjung20@uiuc.edu) Seung Pyo Lee (slee232@uiuc.edu) Ricardo Redder (rredder2@uiuc.edu)

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'UIUC People Finder' - liam


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide2
Info

University of Illinois at Urbana Champaign

Advanced Database Management Systems

CS511

Instructor

ChengXiang Zhai

Sena Lee (senalee2@uiuc.edu)

Heewon Jung (hjung20@uiuc.edu)

Seung Pyo Lee (slee232@uiuc.edu)

Ricardo Redder (rredder2@uiuc.edu)

John Laipple (laipple@uiuc.edu)

agenda
Agenda
  • Problem
    • Motivation
    • Common problem
    • Definition
    • Challenges
    • Solution
  • Implementation
    • Retrieval
    • Interpretation
    • Decision
  • Demo
  • Future work
motivation
Motivation
  • For a given a person
    • The information about the person stored in relational databases is very limited.e.g.: name, age, address, etc.
    • There is a lot of information about he or she in the internet.e.g.: web-pages, papers, blogs, pictures
  • Use the best of both worlds
common problem
Common problem

ChengXiang Zhai

Search

phonebook
Phonebook

ChengXiang Zhai

Search

entity retrieval
Entity retrieval
  • Given:
    • a set of entities E
    • a relational table where each tuple describes some aspects of an entity
    • a set of documents
  • A who is interested in an entity ei, pose a query (Q), and expects the tuple which represents ei, and the documents associated with ei.
our example
Our example
  • Query = keywords (usually name)
  • Table = Phonebook
  • Documents = Results from search engines
challenges
Challenges
  • Semantic problem
    • It is different from finding a document that is mathematically similar to the query
    • It is subjective, the final target is in our mind, and it is not expressed by a function
solving
Solving
  • Use the information from the relational database to improve the documents search
  • The information from the phonebook is reliable, it is very accurate
  • The search engines are more generic, a simple search for a name might not be useful.
our example again
Our example again

ChengXiang Zhai

Search

sequence
Sequence
  • User type a query
  • User click the Search button
  • Application searches in the Phonebook
  • Application retrieve the information from the Phonebook
  • Application searches in the search engines, using the previous information
implementing the idea
Implementing the idea
  • How to retrieve the information and documents from web?
  • How to interpret the results?
  • How to decide whether a given document relates to the entity or not?
web sites as functions
Web-sites as functions
  • Search engines
    • User types the text
    • Click on the button
    • Read the results
    • Click on the results
  • UIUC People Finder
    • Application send the text to the search engine (1, 2)
    • Store the results (3, 4)
using exposed http interface
Using exposed HTTP interface
  • Search engines
    • Uses GET or POST methods to receive information
    • Send the results in HTML
  • Application
    • Convert the query to a GET or POST method, and send it
    • Read the HTML
wrappers
Wrappers
  • Receive the text
  • Build the appropriate URL
  • Connect to the URL
  • Read the response

Query

text

Wrapper

HTML

Example:

http://www.google.com/search?hl=en&q=chengxiang+zhai&btnG=Google+Search

how do we interpret
How do we interpret?
  • Visual language
    • Different styles  different meanings
    • Underline  Links
    • Useful information  Center
extraction from html
Extraction from HTML
  • HTML is Tag based < >
  • Different styles
    • <font size =…>
    • <h2>
    • <bgcolor =…>
  • Links
    • <a href = …>
  • Center
    • <body>
how do we decide
How do we decide?
  • Look for related information
    • Context
    • Names
    • Other information
application
Application
  • Search for keywords found in the Phonebook.
    • Search for the name
    • Search for the department
    • Search for the address
    • etc.
  • Rank the pages
    • Name  +100 points
    • Departament  +50 points
    • Email  +250 points
problem
Problem
  • Performance
    • Problem: Search engines return thousands, or millions of results
    • Solution: Limit the number of retrieved web-pages
    • Problem: Even limiting the number of analyzed web-pages, many pages are accessed
    • Solution: Cache
final architecture
Final architecture

www

online

Google

Yahoo

Phonebook

Searchers

Information

Picture

Documents

cache

Query

text

offline

future work
Future work
  • Extend to other domains
    • MySpace, ACM, Papers, Blogs, etc…
  • Automatic link extraction
  • Better ranking function
  • User feedback
  • Owner feedback