disambiguation l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Disambiguation PowerPoint Presentation
Download Presentation
Disambiguation

Loading in 2 Seconds...

play fullscreen
1 / 16

Disambiguation - PowerPoint PPT Presentation


  • 320 Views
  • Uploaded on

Disambiguation March 7, 2003 Problem Many people have the same name. Example: Michael Jordan, basketball star or professor? Prior knowledge is not feasible. Disambiguation based on context. Example: Scottie Pippen, Dennis Rodman, Phil Jackson Example: U.C. Berkeley, David Cohn Graph

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Disambiguation' - benjamin


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
disambiguation

Disambiguation

March 7, 2003

problem
Problem
  • Many people have the same name.
    • Example: Michael Jordan, basketball star or professor?
  • Prior knowledge is not feasible.
  • Disambiguation based on context.
    • Example: Scottie Pippen, Dennis Rodman, Phil Jackson
    • Example: U.C. Berkeley, David Cohn
graph
Graph

David Cohn

Scottie Pippen

Michael Jordan

Phil Jackson

U.C. Berkeley

Dennis Rodman

graph4
Graph

David Cohn

Michael Jordan

U.C. Berkeley

Scottie Pippen

Michael Jordan

Phil Jackson

Dennis Rodman

algorithm
Algorithm
  • Choose the most relevant people to Michael Jordan.
  • Relevance measured by P( MJ | p) for each person p.
choosing seed values
Choosing Seed Values
  • We need a starting point.
  • People that correspond with the senses of MJ.
  • How well do the seeds separate people into camps?
  • Exhaustive search through all pairs of people.
good seeds
Good Seeds

David Cohn

Scottie Pippen

Phil Jackson

U.C. Berkeley

Dennis Rodman

bad seeds
Bad seeds

David Cohn

Scottie Pippen

Phil Jackson

U.C. Berkeley

Dennis Rodman

choosing seeds i
Choosing Seeds I
  • Let Sj be the jth sense. Denote S1 as basketball star and S2 as professor (interchangeable because no prior knowledge).
  • In the exhaustive search, we arbitrarily pick some person to be seed0 and another to be seed1 where seed0 corresponds to S0 and seed1 to S1.
  • Let P(MJ = S1 | MJ, seed1) = 1 and P(MJ = S0 | MJ, seed1) = 0, vice versa.This probability could be wrong, but it is just an arbitrary assignment.
choosing seeds ii
Choosing Seeds II
  • For each person, p, and sense, Sj:
  • P( MJ = Sj | MJ, p) = n(seedj, p) P(MJ | seedj)
  • Person belong to camp Sj only if P(MJ=Sj| MJ, p) > 0.95.
  • Use harmonic mean to score how well seed0 and seed1 assign people to camps.
iteration i
Iteration I
  • Now we have the best seeds, we are going to assign P( MJ = Sj | p) for each person, p.
  • Step 1: Begin with every person in the unknown except the seeds.
  • Step 2: For each person in the unknown and each sense, calculate P(MJ = Sj | p) = P(MJ | p) P(MJ = Sj|MJ,p)
iteration ii
Iteration II
  • Step 3: For each sense, take the highest P(MJ = Sj | p) and take p out of unknown.
  • Step 4: Repeat step 2 and step 3 until everyone is out of the unknown.
prediction
Prediction
  • Given a link, simply add up all the probability of all the names for each sense.
  • So MJ in link is S1 or S2. We don’t know anything about basketball stars or professors.
dataset
Dataset
  • Movie database from IMDB
  • 230,000 actors
  • 40,000 movies
  • Randomly pick actors who appeared in 15 movies or more (4000 actors).
  • Assign them to be the same person. Run the algorithm. See which sense does each movie belong to.
  • Repeat 100 times.
  • Average accuracy: 75%
good example
Good Example
  • Blandick__Clara(38) vs Gibson__Henry(19):final score = 0.98245638 out of 38 correctBlandick__Clara has seed Phelps__Lee18 out of 19 correctGibson__Henry has seed Davies__John__IV_
  • Clara Blandick from 1910s to 1950s
  • Lee Phelps also from that era, appeared in 6 movies with Clara
  • Henry Gibson from 1960s to 2000s
  • John Davies IV also from that era, appeared in 2 movies with Henry
bad example
Bad Example
  • Marsh__Mae(25) vs Moorehead__Agnes(19):final score = 0.50000016 out of 25 correctMarsh__Mae has seed Morin__Alberto__I_6 out of 19 correctMoorehead__Agnes has seed Wolfe__Ian
  • Mae Marsh, Agnes Moorehead, Alberto Morin, and Ian Wolfe all appeared in movies from 1940s to 1970s.