slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
SOFIE: A Self-Organizing Framework for Information Extraction PowerPoint Presentation
Download Presentation
SOFIE: A Self-Organizing Framework for Information Extraction

Loading in 2 Seconds...

play fullscreen
1 / 32

SOFIE: A Self-Organizing Framework for Information Extraction - PowerPoint PPT Presentation


  • 75 Views
  • Uploaded on

SOFIE: A Self-Organizing Framework for Information Extraction. Fabian M. Suchanek, Mauro Sozio, Gerhard Weikum (Max-Planck-Institute for Informatics, Saarbr ü cken, Germany ) ‏. Ontologies. Entity. subclassOf. subclassOf. Singer. Country. type. DBpedia, YAGO, KYLIN,. type.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

SOFIE: A Self-Organizing Framework for Information Extraction


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

SOFIE:

A Self-Organizing Framework

for Information Extraction

Fabian M. Suchanek, Mauro Sozio, Gerhard Weikum

(Max-Planck-Institute for Informatics, Saarbrücken, Germany)‏

SOFIE: A Self-Organizing Framework for Information Extraction

ontologies
Ontologies

SOFIE: A Self-Organizing Framework for Information Extraction

Entity

subclassOf

subclassOf

Singer

Country

type

DBpedia,

YAGO,

KYLIN,

...

type

Wikipedia

bornInPlace

USA

?

birth-place: USA

"Elvis died in England"

Internet

information extraction
Information Extraction

SOFIE: A Self-Organizing Framework for Information Extraction

Goal:

Extract ontological information from natural language documents

diedInPlace

England

"Elvis died in England"

Previous approaches:

Espresso, DIPRE, LEILA, Snowball, TextRunner, Alice, and many more

ر May deliver non-canonic relations

died in, perished in, was killed in,...

ر May deliver non-canonic entities

England, UK, Great Britain, ...

ر May deliver inconsistent facts

diedInPlace(Elvis,England)

diedInPlace(Elvis,Germany)

pitfalls of information extraction
Pitfalls of Information Extraction

SOFIE: A Self-Organizing Framework for Information Extraction

Ontology

Web page

Elvis died in England.

diedInPlace

France

Louis XIV died in France.

If a pattern occurs with two entities that stand in a relation, then the pattern maps to the relation.

"died in" = diedInPlace

pitfalls of information extraction1
Pitfalls of Information Extraction

SOFIE: A Self-Organizing Framework for Information Extraction

Ontology

Web page

Elvis died in England.

Louis XIV died in France.

If a pattern occurs with two entities that stand in a relation, then the pattern maps to the relation.

"died in" = diedInPlace

If a meaningful pattern occurs with two entities, then the entities stand in the relation.

diedInPlace

"Elvis"

"England"

pitfalls of information extraction2
Pitfalls of Information Extraction

SOFIE: A Self-Organizing Framework for Information Extraction

Ontology

Web page

?

Taxidophobist

Elvis died in England.

Louis XIV died in France.

If a pattern occurs with two entities that stand in a relation, then the pattern maps to the relation.

"died in" = diedInPlace

If a meaningful pattern occurs with two entities, then the entities stand in the relation.

diedInPlace

"Elvis"

"England"

pitfalls of information extraction3
Pitfalls of Information Extraction

SOFIE: A Self-Organizing Framework for Information Extraction

Web page

Reasoning Problem

Elvis died in England.

Taxidophobist

Louis XIV died in France.

If a pattern occurs with two entities that stand in a relation, then the pattern maps to the relation.

"died in" = diedInPlace

If a meaningful pattern occurs with two entities, then the entities stand in the relation.

diedInPlace

"Elvis"

"England"

pitfalls of information extraction4
Pitfalls of Information Extraction

SOFIE: A Self-Organizing Framework for Information Extraction

Web page

Reasoning Problem

Elvis died in England.

Taxidophobist

Louis XIV died in France.

If a pattern occurs with two entities that stand in a relation, then the pattern maps to the relation.

Disambiguation Problem

"died in" = diedInPlace

If a meaningful pattern occurs with two entities, then the entities stand in the relation.

pitfalls of information extraction5
Pitfalls of Information Extraction

SOFIE: A Self-Organizing Framework for Information Extraction

Pattern Matching Problem

Reasoning Problem

Taxidophobist

Elvis died in England.

Louis XIV died in France.

"died in" = diedInPlace ?

Disambiguation Problem

information extraction as formulas
Information Extraction as Formulas

SOFIE: A Self-Organizing Framework for Information Extraction

Reasoning Problem

Taxidophobist

type(Elvis,Taxidophobist).

type(X,Taxidophobist)

& bornInPlace(X,Y)

=>  diedInPlace(X,Z) [0.8]

information extraction as formulas1
Information Extraction as Formulas

SOFIE: A Self-Organizing Framework for Information Extraction

Pattern Matching Problem

Reasoning Problem

type(Elvis,Taxidophobist).

Elvis died in England.

type(X,Taxidophobist)

& bornInPlace(X,Y)

=>  diedInPlace(X,Z)

Louis XIV died in France.

"died in" = diedInPlace ?

Disambiguation Problem

information extraction as formulas2
Information Extraction as Formulas

SOFIE: A Self-Organizing Framework for Information Extraction

Assumptions:

رIn one document, the same word has always the same meaning

رThe ontology already knows all important meanings of proper names

possibleMeaning(Elvis@D15, ElvisPresley). [0.7]

Disambiguation Problem

information extraction as formulas3
Information Extraction as Formulas

SOFIE: A Self-Organizing Framework for Information Extraction

Assumptions:

رIn one document, the same word has always the same meaning

رThe ontology already knows all important meanings of proper names

possibleMeaning(Elvis@D15, ElvisPresley). [0.7]

Prior estimation for the likelihood of this meaning.

A word in context (wic).

Here: The word "Elvis" in document D15

| words(D15) ∩ rel(ElvisPresley)|

One possible meaning of "Elvis" as given by the ontology

| words(D15) |

information extraction as formulas4
Information Extraction as Formulas

SOFIE: A Self-Organizing Framework for Information Extraction

Assumptions:

رIn one document, the same word has always the same meaning

رThe ontology already knows all important meanings of proper names

possibleMeaning(Elvis@D15, ElvisPresley). [0.7]

possibleMeaning(X,Y) => means(X,Y)

means(X,Y) & YZ =>  means(X,Z)

information extraction as formulas5
Information Extraction as Formulas

SOFIE: A Self-Organizing Framework for Information Extraction

Pattern Matching Problem

Reasoning Problem

type(Elvis,Taxidophobist).

Elvis died in England.

type(X,Taxidophobist)

& bornInPlace(X,Y)

=>  diedInPlace(X,Z)

Louis XIV died in France.

"died in" = diedInPlace ?

Disambiguation Problem

meaning(Elvis@D15,

ElvisPresley). [0.7]

information extraction as formulas6
Information Extraction as Formulas

SOFIE: A Self-Organizing Framework for Information Extraction

Pattern Matching Problem

occurs("died in",

Elvis@D15,

England@D15). [14]

Elvis died in England.

Louis XIV died in France.

"died in" = diedInPlace ?

occurs(P,Wic1,Wic2) & means(Wic1,X) & means(Wic2,Y) & R(X,Y)

=> mapsTo(P,R)

occurs(P,Wic1,Wic2) & means(Wic1,X) & means(Wic2,Y) & mapsTo(P,R)

=> R(X,Y)

information extraction as formulas7
Information Extraction as Formulas

SOFIE: A Self-Organizing Framework for Information Extraction

Pattern Matching Problem

Reasoning Problem

type(Elvis,Taxidophobist).

occurs("died in",

Elvis@D15,

England@D15). [14]

type(X,Taxidophobist)

& bornInPlace(X,Y)

=>  diedInPlace(X,Z)

Find truth assignments to hypotheses so that the weight of satisfied formulas is maximized

means(Elvis@D15, ElvisPresley) ?

mapsTo("died In", diedInPlace) ?

diedIn(ElvisPresley, England) ?

Disambiguation Problem

meaning(Elvis@D15,

ElvisPresley). [0.7]

weighted max sat problem
Weighted MAX SAT Problem

SOFIE: A Self-Organizing Framework for Information Extraction

Weighted MAX SAT Problem

Find truth assignments to hypotheses so that the weight of satisfied formulas is maximized

Problems:

رThe Weighted MAX SAT Problem is NP-hard

رOur instance of the problem is huge

ر The most popular linear approximation algorithm (Johnson's)

does not work well with our type of formulas

bornInPlace(X,Y) =>  bornInPlace(X,Z)

 A v  B

 A v  C

 B v  C

Johnson's cannot approximate better than 2/3

fms algorithm
FMS Algorithm

The Functional MAX SAT Algorithm considers only unit clauses.

Formulas

Hypotheses

A v B [w1]

A v B [w2]

B v C [w3]

C [w4]

= false

A

B

C

= false

= true

The Functional MAX SAT Algorithm propagates Dominating Unit Clauses

A v B [10]

A [10]

A [30]

30 > 10+10

A = true

SOFIE: A Self-Organizing Framework for Information Extraction

fms algorithm1
FMS Algorithm

Polynomial time

FMS Algorithm

FOR i=1 TO 42

...

NEXT i

Approximation Guarantee

Experiments show better performance in practice than Johnson's algorithm in our setting .

SOFIE: A Self-Organizing Framework for Information Extraction

fms algorithm2
FMS Algorithm

Elvis died in England

r(X,Y) & s(Y) => t(X,Y)

FMS Algorithm

FOR i=1 TO 42

...

NEXT i

SOFIE: A Self-Organizing Framework for Information Extraction

fms algorithm3
FMS Algorithm

Elvis died in England

r(X,Y) & s(Y) => t(X,Y)

type(Elvis,Taxidophobist)=1

diedIn(Elvis,England)=0

FMS Algorithm

FOR i=1 TO 42

...

NEXT i

means(Elvis@D15,Elvis)=0

means(Elvis@D15,...)=1

diedIn

England

St. Elvis

SOFIE: A Self-Organizing Framework for Information Extraction

fms algorithm4
FMS Algorithm

r(X,Y) & s(Y) => t(X,Y)

FMS Algorithm

FOR i=1 TO 42

...

NEXT i

diedIn

England

St. Elvis

SOFIE: A Self-Organizing Framework for Information Extraction

slide24

Other Experiments

SOFIE: A Self-Organizing Framework for Information Extraction

conclusion
Conclusion

SOFIE unifies the tasks of

رentity disambiguation

رpattern extraction

رsemantic constraint reasoning

in a single framework, delivering

رcanonicalized facts

رof high precision (experiments show 90% precision)

died in England...

but is alive!

SOFIE: A Self-Organizing Framework for Information Extraction

sofie rules
SOFIE rules!

R(X,Y)

/\ R(X,Z)

/\ type(R,function)

=> Y = Z

occurs(P,WX,WY)

/\ refersTo(WX.X)

/\ refersTo(WY,Y)

/\ R(X,Y)

=> expresses(P,R)

occurs(P,WX,WY)

/\ expressed(P,R)

/\ refersTo(WX.X)

/\ refersTo(WY,Y)

/\ range(R,D1)

/\ domain(R,D2)

/\ type(X,D1)

/\ type(Y,D2)

=> R(X,Y)

disambiguationPrior(W,X) => refersTo(W,X)

 R(X,Y)

bornInYear(X,B) /\ diedInYear(X,D) => B<D

SOFIE: A Self-Organizing Framework for Information Extraction

slide27

SOFIE: Experiments

SOFIE: A Self-Organizing Framework for Information Extraction

slide28

SOFIE: Large-Scale Experiment

Corpus:

3700 biography documents downloaded from the Web

Goal:

Extract bornIn, bornOnDate, diedIn, diedOnDate, politicianOf

Results: (precision in %)

Runtime: (summed over 5 batches)

Parsing 7:05h

Hypothesis Generation 6:15h

Solving 2:30h

Total 15:50h

87 87 13 98 95

 90

bornIn bornOnD diedIn diedOnD polOf

SOFIE: A Self-Organizing Framework for Information Extraction

slide29

SOFIE: Relation to Markov Logic

Number of satisfied instances of the ith formula

Weight of the ith formula

r(x,y) /\ s(x,z) => t(x,z) [w]

...

P(X) ~  e sat(i,X) wi

max X e sat(i,X) wi

P

max X log(  e sat(i,X) wi )

max X sat(i,X) wi

false true

bornIn(Nicholas, Patras)

~~~~> Weighted MAX SAT problem

SOFIE: A Self-Organizing Framework for Information Extraction

grounding
Grounding

SOFIE: A Self-Organizing Framework for Information Extraction

r(X,Y) & s(Y) => t(X,Y)

Immutable, complete facts (e.g. pattern occurrences)

{ r(X,Y),  s(Y), t(X,Y) }

r(a,a)

Entities={a,b}

r(a,b)

r(b,a)

r(b,b)

{ r(a,a),  s(a), t(a,a) }

{ r(a,b),  s(b), t(a,b) }

{ r(b,a),  s(a), t(b,a) }

{ r(b,b),  s(b), t(b,b) }

grounding1
Grounding

SOFIE: A Self-Organizing Framework for Information Extraction

r(X,Y) & s(Y) => t(X,Y)

Immutable, complete facts (e.g. pattern occurrences)

{ r(X,Y),  s(Y), t(X,Y) }

r(a,a) [w]

r(a,b)

r(b,a)

r(b,b)

{  s(a), t(a,a) } [w]

grounding2
Grounding

SOFIE: A Self-Organizing Framework for Information Extraction

{ s(a), t(a,a) } [w1]

{p(c,d),  q(e), } [w2]

Find truth assignments to hypotheses so that the weight of satisfied formulas is maximized

means(Elvis@D15, ElvisPresley) = true ?

mapsTo("died In", diedInPlace) = true ?

diedIn(ElvisPresley, England) = true ?