1 / 32

SOFIE: A Self-Organizing Framework for Information Extraction

SOFIE: A Self-Organizing Framework for Information Extraction. Fabian M. Suchanek, Mauro Sozio, Gerhard Weikum (Max-Planck-Institute for Informatics, Saarbr ü cken, Germany ) ‏. Ontologies. Entity. subclassOf. subclassOf. Singer. Country. type. DBpedia, YAGO, KYLIN,. type.

guri
Download Presentation

SOFIE: A Self-Organizing Framework for Information Extraction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SOFIE: A Self-Organizing Framework for Information Extraction Fabian M. Suchanek, Mauro Sozio, Gerhard Weikum (Max-Planck-Institute for Informatics, Saarbrücken, Germany)‏ SOFIE: A Self-Organizing Framework for Information Extraction

  2. Ontologies SOFIE: A Self-Organizing Framework for Information Extraction Entity subclassOf subclassOf Singer Country type DBpedia, YAGO, KYLIN, ... type Wikipedia bornInPlace USA ? birth-place: USA "Elvis died in England" Internet

  3. Information Extraction SOFIE: A Self-Organizing Framework for Information Extraction Goal: Extract ontological information from natural language documents diedInPlace England "Elvis died in England" Previous approaches: Espresso, DIPRE, LEILA, Snowball, TextRunner, Alice, and many more ر May deliver non-canonic relations died in, perished in, was killed in,... ر May deliver non-canonic entities England, UK, Great Britain, ... ر May deliver inconsistent facts diedInPlace(Elvis,England) diedInPlace(Elvis,Germany)

  4. Pitfalls of Information Extraction SOFIE: A Self-Organizing Framework for Information Extraction Ontology Web page Elvis died in England. diedInPlace France Louis XIV died in France. If a pattern occurs with two entities that stand in a relation, then the pattern maps to the relation. "died in" = diedInPlace

  5. Pitfalls of Information Extraction SOFIE: A Self-Organizing Framework for Information Extraction Ontology Web page Elvis died in England. Louis XIV died in France. If a pattern occurs with two entities that stand in a relation, then the pattern maps to the relation. "died in" = diedInPlace If a meaningful pattern occurs with two entities, then the entities stand in the relation. diedInPlace "Elvis" "England"

  6. Pitfalls of Information Extraction SOFIE: A Self-Organizing Framework for Information Extraction Ontology Web page ? Taxidophobist Elvis died in England. Louis XIV died in France. If a pattern occurs with two entities that stand in a relation, then the pattern maps to the relation. "died in" = diedInPlace If a meaningful pattern occurs with two entities, then the entities stand in the relation. diedInPlace "Elvis" "England"

  7. Pitfalls of Information Extraction SOFIE: A Self-Organizing Framework for Information Extraction Web page Reasoning Problem Elvis died in England. Taxidophobist Louis XIV died in France. If a pattern occurs with two entities that stand in a relation, then the pattern maps to the relation. "died in" = diedInPlace If a meaningful pattern occurs with two entities, then the entities stand in the relation. diedInPlace "Elvis" "England"

  8. Pitfalls of Information Extraction SOFIE: A Self-Organizing Framework for Information Extraction Web page Reasoning Problem Elvis died in England. Taxidophobist Louis XIV died in France. If a pattern occurs with two entities that stand in a relation, then the pattern maps to the relation. Disambiguation Problem "died in" = diedInPlace If a meaningful pattern occurs with two entities, then the entities stand in the relation.

  9. Pitfalls of Information Extraction SOFIE: A Self-Organizing Framework for Information Extraction Pattern Matching Problem Reasoning Problem Taxidophobist Elvis died in England. Louis XIV died in France. "died in" = diedInPlace ? Disambiguation Problem

  10. Information Extraction as Formulas SOFIE: A Self-Organizing Framework for Information Extraction Reasoning Problem Taxidophobist type(Elvis,Taxidophobist). type(X,Taxidophobist) & bornInPlace(X,Y) =>  diedInPlace(X,Z) [0.8]

  11. Information Extraction as Formulas SOFIE: A Self-Organizing Framework for Information Extraction Pattern Matching Problem Reasoning Problem type(Elvis,Taxidophobist). Elvis died in England. type(X,Taxidophobist) & bornInPlace(X,Y) =>  diedInPlace(X,Z) Louis XIV died in France. "died in" = diedInPlace ? Disambiguation Problem

  12. Information Extraction as Formulas SOFIE: A Self-Organizing Framework for Information Extraction Assumptions: رIn one document, the same word has always the same meaning رThe ontology already knows all important meanings of proper names possibleMeaning(Elvis@D15, ElvisPresley). [0.7] Disambiguation Problem

  13. Information Extraction as Formulas SOFIE: A Self-Organizing Framework for Information Extraction Assumptions: رIn one document, the same word has always the same meaning رThe ontology already knows all important meanings of proper names possibleMeaning(Elvis@D15, ElvisPresley). [0.7] Prior estimation for the likelihood of this meaning. A word in context (wic). Here: The word "Elvis" in document D15 | words(D15) ∩ rel(ElvisPresley)| One possible meaning of "Elvis" as given by the ontology | words(D15) |

  14. Information Extraction as Formulas SOFIE: A Self-Organizing Framework for Information Extraction Assumptions: رIn one document, the same word has always the same meaning رThe ontology already knows all important meanings of proper names possibleMeaning(Elvis@D15, ElvisPresley). [0.7] possibleMeaning(X,Y) => means(X,Y) means(X,Y) & YZ =>  means(X,Z)

  15. Information Extraction as Formulas SOFIE: A Self-Organizing Framework for Information Extraction Pattern Matching Problem Reasoning Problem type(Elvis,Taxidophobist). Elvis died in England. type(X,Taxidophobist) & bornInPlace(X,Y) =>  diedInPlace(X,Z) Louis XIV died in France. "died in" = diedInPlace ? Disambiguation Problem meaning(Elvis@D15, ElvisPresley). [0.7]

  16. Information Extraction as Formulas SOFIE: A Self-Organizing Framework for Information Extraction Pattern Matching Problem occurs("died in", Elvis@D15, England@D15). [14] Elvis died in England. Louis XIV died in France. "died in" = diedInPlace ? occurs(P,Wic1,Wic2) & means(Wic1,X) & means(Wic2,Y) & R(X,Y) => mapsTo(P,R) occurs(P,Wic1,Wic2) & means(Wic1,X) & means(Wic2,Y) & mapsTo(P,R) => R(X,Y)

  17. Information Extraction as Formulas SOFIE: A Self-Organizing Framework for Information Extraction Pattern Matching Problem Reasoning Problem type(Elvis,Taxidophobist). occurs("died in", Elvis@D15, England@D15). [14] type(X,Taxidophobist) & bornInPlace(X,Y) =>  diedInPlace(X,Z) Find truth assignments to hypotheses so that the weight of satisfied formulas is maximized means(Elvis@D15, ElvisPresley) ? mapsTo("died In", diedInPlace) ? diedIn(ElvisPresley, England) ? Disambiguation Problem meaning(Elvis@D15, ElvisPresley). [0.7]

  18. Weighted MAX SAT Problem SOFIE: A Self-Organizing Framework for Information Extraction Weighted MAX SAT Problem Find truth assignments to hypotheses so that the weight of satisfied formulas is maximized Problems: رThe Weighted MAX SAT Problem is NP-hard رOur instance of the problem is huge ر The most popular linear approximation algorithm (Johnson's) does not work well with our type of formulas bornInPlace(X,Y) =>  bornInPlace(X,Z)  A v  B  A v  C  B v  C Johnson's cannot approximate better than 2/3

  19. FMS Algorithm The Functional MAX SAT Algorithm considers only unit clauses. Formulas Hypotheses A v B [w1] A v B [w2] B v C [w3] C [w4] = false A B C = false = true The Functional MAX SAT Algorithm propagates Dominating Unit Clauses A v B [10] A [10] A [30] 30 > 10+10 A = true SOFIE: A Self-Organizing Framework for Information Extraction

  20. FMS Algorithm Polynomial time FMS Algorithm FOR i=1 TO 42 ... NEXT i Approximation Guarantee Experiments show better performance in practice than Johnson's algorithm in our setting . SOFIE: A Self-Organizing Framework for Information Extraction

  21. FMS Algorithm Elvis died in England r(X,Y) & s(Y) => t(X,Y) FMS Algorithm FOR i=1 TO 42 ... NEXT i SOFIE: A Self-Organizing Framework for Information Extraction

  22. FMS Algorithm Elvis died in England r(X,Y) & s(Y) => t(X,Y) type(Elvis,Taxidophobist)=1 diedIn(Elvis,England)=0 FMS Algorithm FOR i=1 TO 42 ... NEXT i means(Elvis@D15,Elvis)=0 means(Elvis@D15,...)=1 diedIn England St. Elvis SOFIE: A Self-Organizing Framework for Information Extraction

  23. FMS Algorithm r(X,Y) & s(Y) => t(X,Y) FMS Algorithm FOR i=1 TO 42 ... NEXT i diedIn England St. Elvis SOFIE: A Self-Organizing Framework for Information Extraction

  24. Other Experiments SOFIE: A Self-Organizing Framework for Information Extraction

  25. Conclusion SOFIE unifies the tasks of رentity disambiguation رpattern extraction رsemantic constraint reasoning in a single framework, delivering رcanonicalized facts رof high precision (experiments show 90% precision) died in England... but is alive! SOFIE: A Self-Organizing Framework for Information Extraction

  26. SOFIE rules! R(X,Y) /\ R(X,Z) /\ type(R,function) => Y = Z occurs(P,WX,WY) /\ refersTo(WX.X) /\ refersTo(WY,Y) /\ R(X,Y) => expresses(P,R) occurs(P,WX,WY) /\ expressed(P,R) /\ refersTo(WX.X) /\ refersTo(WY,Y) /\ range(R,D1) /\ domain(R,D2) /\ type(X,D1) /\ type(Y,D2) => R(X,Y) disambiguationPrior(W,X) => refersTo(W,X)  R(X,Y) bornInYear(X,B) /\ diedInYear(X,D) => B<D SOFIE: A Self-Organizing Framework for Information Extraction

  27. SOFIE: Experiments SOFIE: A Self-Organizing Framework for Information Extraction

  28. SOFIE: Large-Scale Experiment Corpus: 3700 biography documents downloaded from the Web Goal: Extract bornIn, bornOnDate, diedIn, diedOnDate, politicianOf Results: (precision in %) Runtime: (summed over 5 batches) Parsing 7:05h Hypothesis Generation 6:15h Solving 2:30h Total 15:50h 87 87 13 98 95  90 bornIn bornOnD diedIn diedOnD polOf SOFIE: A Self-Organizing Framework for Information Extraction

  29. SOFIE: Relation to Markov Logic Number of satisfied instances of the ith formula Weight of the ith formula r(x,y) /\ s(x,z) => t(x,z) [w] ... P(X) ~  e sat(i,X) wi max X e sat(i,X) wi P max X log(  e sat(i,X) wi ) max X sat(i,X) wi false true bornIn(Nicholas, Patras) ~~~~> Weighted MAX SAT problem SOFIE: A Self-Organizing Framework for Information Extraction

  30. Grounding SOFIE: A Self-Organizing Framework for Information Extraction r(X,Y) & s(Y) => t(X,Y) Immutable, complete facts (e.g. pattern occurrences) { r(X,Y),  s(Y), t(X,Y) } r(a,a) Entities={a,b} r(a,b) r(b,a) r(b,b) { r(a,a),  s(a), t(a,a) } { r(a,b),  s(b), t(a,b) } { r(b,a),  s(a), t(b,a) } { r(b,b),  s(b), t(b,b) }

  31. Grounding SOFIE: A Self-Organizing Framework for Information Extraction r(X,Y) & s(Y) => t(X,Y) Immutable, complete facts (e.g. pattern occurrences) { r(X,Y),  s(Y), t(X,Y) } r(a,a) [w] r(a,b) r(b,a) r(b,b) {  s(a), t(a,a) } [w]

  32. Grounding SOFIE: A Self-Organizing Framework for Information Extraction { s(a), t(a,a) } [w1] {p(c,d),  q(e), } [w2] Find truth assignments to hypotheses so that the weight of satisfied formulas is maximized means(Elvis@D15, ElvisPresley) = true ? mapsTo("died In", diedInPlace) = true ? diedIn(ElvisPresley, England) = true ?

More Related