1 / 10

Fact Extraction

Fact Extraction. Wikipedia Knowledge Extraction. Overview. Pronoun Resolution module Infobox extraction SRL parsing Improved refinement Clustering Hadoop compatibility. Pronoun Resolution Module.

jela
Download Presentation

Fact Extraction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fact Extraction Wikipedia Knowledge Extraction

  2. Overview • Pronoun Resolution module • Infobox extraction • SRL parsing • Improved refinement • Clustering • Hadoop compatibility

  3. Pronoun Resolution Module • “His mother wanted him to get a good education so she sent him to live with his grandparents in Honolulu, HI” (Barack Obama)

  4. Pronoun Resolution Module • “His mother wanted him to get a good education so she sent him to live with his grandparents in Honolulu, HI” (Barack Obama) • Current solution: replace pronouns with article title (very primitive) • Target solution: • Nobody in the world has solved this yet • Use an existing system that is usually correct? • Simple rules for common patterns?

  5. Infobox extraction • Convert information into simple sentences: • Joe Biden is Barack Obama’s Vice President • Barack Obama is preceded by George W. Bush • Use type of phrase (Noun Phrase, Verb Phrase) to determine sentence to form. • Read papers from Turing Center (University of Washington)

  6. SRL Parsing • Performs a deep analysis on each sentence. • E.g. “Yoshi has a long tongue which he uses to grab enemies and eat them.” • has (A0: Yoshi, A1: long tongue) • use (A0: Yoshi, A1: long tongue, A2: grab enemies and eat them) • Use SRL parsing to improve quality and representation of knowledge. • Problem: speed and complexity

  7. Improved refinement • Current system has Subject, Object, Verb tuples • Problem: hard to define what words to incorporate in each phrase • E.g. “'The dog ( Canis lupus familiaris )' 'is' 'a mammal from the family Canidae‘” • The dog? dog? The dog ( Canis lupus familiaris )? • a mammal? a mammal from the family Canidae? • Possible solutions: • Different levels of information? • Simple rules based on part of speech tags?

  8. Clustering • Idea: Determine whether two separate mentions point to the same concept • ‘The dog’, ‘a dog’, ‘dogs’ • ‘Cats’, ‘C.A.T.S’, ‘CAT Scan’ • ‘President Obama’, ‘President Barack Obama’ • Possible solutions: • Feature-based classification • Self organizing map • Terms associated

  9. Hadoop Compatibility • Need to ensure scaling is possible for move to regular Wikipedia • Hadoop is an open source implementation of the Map-Reduce algorithm • Map-Reduce is an algorithm that parallelizes a process by splitting its iterations over several machines

  10. Questions?

More Related