1 / 7

PLUIE: Probability and Logic Unified for Information Extraction

PLUIE: Probability and Logic Unified for Information Extraction. Stuart Russell Patrick G allinari , P atrice Perny. Project Goals. “Open” i nformation extraction Construct knowledge bases from the web Learn new classes, relations, linguistic patterns Learn new predictive regularities

iram
Download Presentation

PLUIE: Probability and Logic Unified for Information Extraction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PLUIE: Probability and Logic Unified for Information Extraction Stuart Russell Patrick Gallinari, Patrice Perny

  2. Project Goals • “Open” information extraction • Construct knowledge bases from the web • Learn new classes, relations, linguistic patterns • Learn new predictive regularities • Integrate facts, entities across multiple documents • Support question answering • Accuracy, consistency, integration, and utility; not scale for its own sake

  3. Approach • Probabilistic inference with the Web as evidence • Generative models when available World Web

  4. Approach, contd. • Open-universe probability models (e.g., BLOG) • First-order expressive power (objects, relations, functions, quantifiers, equality, etc.) • Allow for uncertainty about existence, identity of objects • Generative model consists of • What might be true in the world • Who might choose to say what • How they might choose to say it

  5. Approach contd. • Rigorous ontological framework • Standard taxonomic hierarchy that supports distinctions needed for language • E.g., mass nouns (water) vs count nouns (lake) • Proper treatment of events and time; avoid deficient “facts” such as • Man Utd beat Chelsea; Chelsea beat Man Utd(PowerSet) • Hank Paulson is the CEO of Goldman Sachs (NELL)

  6. Open questions • Efficient inference • What is extracted? Posterior over possible worlds? • How to identify new categories and relations • HCI: Presenting infinite heterogeneous posterior distributions: Who wrote what when when“who,” “what” and “when” vary across worlds? • Making use of partially extracted or unextracted information – “data spaces” (Franklin, Halevy) • Adversarial data: game-theoretic analysis?

  7. Plan • Reading group • Weekly meeting (day and time?) • Participants take turns presenting • Reading list at www.cs.berkeley.edu/~russell/pluie/readings.html • Formal project (ANR) runs 1/1/13 to 8/31/14 • Will continue indefinitely • Hiring two postdocs • Possible collaborations • Tom Mitchell’s NELL project (CMU) • Andrew McCallum (UMass) • Kevin Murphy (Google’s Knowledge Graph project)

More Related