Lecture
Download
1 / 43

Lecture # 32 WWW Search - PowerPoint PPT Presentation


  • 53 Views
  • Uploaded on

Lecture # 32 WWW Search. Review: Data Organization. Kinds of things to organize Menu items Text Images Sound Videos Records (I.e. a person ’ s name, address, & phone number, or a car ’ s year, make, & model). Review: Data Organization. Three ways to find things:

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Lecture # 32 WWW Search' - derron


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Lecture #32

WWW Search


Review data organization
Review: Data Organization

  • Kinds of things to organize

    • Menu items

    • Text

    • Images

    • Sound

    • Videos

    • Records (I.e. a person’s name, address, & phone number, or a car’s year, make, & model)


Review data organization1
Review: Data Organization

  • Three ways to find things:

    • Lists (in-order search, binary search)

    • Trees (balance number of branches with time to decide which is correct branch)

    • Search



Search issues
Search issues

  • How do we say what we want?

    • I want a story about pigs

    • I want a picture of a rooster

    • How many televisions were sold in Vietnam during 2000?

    • Find a movie like this one

  • How does the computer find what we said?


Things to search for
Things to search for

  • Records

  • Text

  • Images

  • Audio

  • Video


Records
Records

  • Car

    • Price

    • Miles

    • Year

    • Make

    • Doors

  • Queries

    • Price < 6000 & Miles<100000

    • Make == Toyota & Year > 1993


Queries
Queries

  • Make == Toyota & Year >1993


Queries1
Queries

  • Make == Toyota & Year >1993


Queries2
Queries

  • Year >1993 or Price < $3,000


Queries3
Queries

  • Year >1993 or Price < $3,000


Databases
Databases

  • Large collections of records

  • Accessed by queries


Things to search for1
Things to search for

  • Records

  • Text

  • Images

  • Audio

  • Video


Text searching
Text searching

  • How do I say what I want?

    • Type some phrase

      • I want a story about pigs

  • How will the computer match this?

    • What is text?

      • An array of characters

    • What can can a computer do with text?

      • Match characters


Text searching1
Text searching

  • People think in words not characters

  • How do I convert an array of characters into an array of words?

    • Collect together sequences of letters

    • How do I know if character C is a letter?

      • C>=“a” & C<=“z” | C>=“A” & C<=“Z”


Convert to words
Convert to words

  • Because people think in words


Every document is an array of words
Every document is an array of words

  • I want a story about pigs

  • How will I find the right documents?

    • Find all documents that have the word “pigs”


Searching text
Searching text

  • How will I find pigs fast?

    • Create an index of all words

      • With each word store the name or address of each document that contains that word

    • Search the index for “pigs”

      • Return the list of documents

      • Use a binary search on the word list (50,000 words)


Problems
Problems

  • What if a document has the word “Pig” but not “pigs”?

  • Normalize

    • Case - make all words lower case

      • Pig -> pig

    • Stemming - remove all suffixes and prefixes before putting a word into the index

      • pigs -> pig

      • piggy -> pig


Problems1
Problems

  • I want a story about pigs?

    • How does the computer know to search for pigs?

      • It doesn’t

    • How does the computer know what a story is?

      • It doesn’t


Searching
Searching

  • I want a story about pigs

  • Pick out the important words and search for them

    • Which words are important?

    • D = number of times a word appears in a document

    • A = average number of times a word appears in all documents

    • Importance = D/A

      • Why?


How do we create an index of all documents on the web
How do we create an index of all documents on the Web?

  • Try = a list of URLs

  • Seen = all URLs you have seen

    While (Try is not empty)

    { Page = take a URL from Try

    Words = all the “important” words in Page

    add Page to the index using all of Words

    Links = all URLs in Page

    for every Link that is not in Seen add Link to Try and to Seen

    }


Other ways to find important words and important documents
Other ways to find important words and important documents

  • A Document is important if many other documents point to it

  • A word is important in document D if that word occurs frequently in documents that link to document D.


Images
Images

  • What will I say when searching for an image?

    • I want a rooster picture

    • Draw a picture of a rooster?


Search by picture
Search by picture?

Is this possible? If so, how?

?


What s in a picture
What’s in a picture?

  • Computers don’t understand the contents of images

  • To a computer an image is a bunch of colored pixels


I want a picture of a rooster
I want a picture of a rooster

  • Label all of the pictures

  • How does Google Images do it?

    • File name of the picture “rooster-crossingSt.jpg”

    • Words around the picture in the HTML

  • Use “Safe Search” and set filters appropriately

    (http://www.youtube.com/watch?v=maWx-ApkBCs)


Audio
Audio

  • Talking

    • Use speech recognition to convert audio to text

    • With each recognized word keep track of where in the audio it was recognized.

  • Build an index using the recognized text

    • Normalize based on how words sound rather than are spelled.


Video
Video

  • Where in “Casablanca” does Bogart say “Play it again Sam” ?

    • he never does, he just says “play it”

  • How can the computer find that?

    • Transcribe the audio

    • Speech recognition on the audio


Video1
Video

  • Does Woody ever kiss Bo Peep?

  • Exactly what color is a kiss?


Video2
Video

  • Does Woody ever kiss Bo Peep?

  • Annotate every frame with who is in the frame and search for frames with both Woody and Bo Peep.


So what s with this
So what’s with this?




Search
Search

  • Records

    • Queries

      • < > = And Or

  • Text

    • Normalized words (case, stemming, thesaurus)

  • Images

    • Add words

  • Audio

    • Transcribe or recognize as words

  • Video

    • Transcribe

    • Annotate


Re search directions in image recognition search and retrieval
Re-Search” Directions in Image Recognition, Search and Retrieval


Face detection in commercial digital cameras
Face DetectionIn Commercial Digital Cameras

  • Train on

  • 1000’s of faces

  • Millions of non-faces

Face Detection – Viola & Jones


Face recognition eigenfaces turk and pentland 1991
Face Recognition(Eigenfaces [Turk and Pentland 1991])

Project image into higher-dimensional space

2

N

N

0

71

250

68

210

44

128

53

N

“Recognize” by grouping unknown image with closest training example


Face recognition picasa google
Face Recognition(Picasa - Google)

  • Image search/organization

  • Automatically finds, crops and groups images of the same person from a collection of photos

  • Allows user feedback (trainable) - user can indicate if it found the wrong person.


Face object recognition search feature based technology

Bag of “words”*

Face/Object Recognition/Search:Feature-Based Technology

Extract Features

Object

*Li Fei-Fei (Princeton)

Create visual“words” from image features.


Face object recognition search feature based technology1
Face/Object Recognition/Search:Feature-Based Technology

*Li Fei-Fei (Princeton)

Do this for multiple objects


Face object recognition search bag of words
Face/Object Recognition/Search:Bag of Words

How to get matching images/documents?:

Use “word” frequencies = where nid = # times word i

occurs in document d

nd = total # words

in document d

Then combine word frequency with

inverse document frequency weighting

to downweight words that occur frequently

(D = # of occurrences; A = average # of occurrences)


Face object recognition search feature based technology2
Face/Object Recognition/Search:Feature-Based Technology

*Li Fei-Fei (Princeton)

Drop word features through a “vocabulary tree” to classify


ad