Loading in 2 Seconds...

SOFIE: A Self-Organizing Framework for Information Extraction

Loading in 2 Seconds...

- By
**guri** - Follow User

- 75 Views
- Uploaded on

Download Presentation
## SOFIE: A Self-Organizing Framework for Information Extraction

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

A Self-Organizing Framework

for Information Extraction

Fabian M. Suchanek, Mauro Sozio, Gerhard Weikum

(Max-Planck-Institute for Informatics, Saarbrücken, Germany)

SOFIE: A Self-Organizing Framework for Information Extraction

Ontologies

SOFIE: A Self-Organizing Framework for Information Extraction

Entity

subclassOf

subclassOf

Singer

Country

type

DBpedia,

YAGO,

KYLIN,

...

type

Wikipedia

bornInPlace

USA

?

birth-place: USA

"Elvis died in England"

Internet

Information Extraction

SOFIE: A Self-Organizing Framework for Information Extraction

Goal:

Extract ontological information from natural language documents

diedInPlace

England

"Elvis died in England"

Previous approaches:

Espresso, DIPRE, LEILA, Snowball, TextRunner, Alice, and many more

ر May deliver non-canonic relations

died in, perished in, was killed in,...

ر May deliver non-canonic entities

England, UK, Great Britain, ...

ر May deliver inconsistent facts

diedInPlace(Elvis,England)

diedInPlace(Elvis,Germany)

Pitfalls of Information Extraction

SOFIE: A Self-Organizing Framework for Information Extraction

Ontology

Web page

Elvis died in England.

diedInPlace

France

Louis XIV died in France.

If a pattern occurs with two entities that stand in a relation, then the pattern maps to the relation.

"died in" = diedInPlace

Pitfalls of Information Extraction

SOFIE: A Self-Organizing Framework for Information Extraction

Ontology

Web page

Elvis died in England.

Louis XIV died in France.

If a pattern occurs with two entities that stand in a relation, then the pattern maps to the relation.

"died in" = diedInPlace

If a meaningful pattern occurs with two entities, then the entities stand in the relation.

diedInPlace

"Elvis"

"England"

Pitfalls of Information Extraction

SOFIE: A Self-Organizing Framework for Information Extraction

Ontology

Web page

?

Taxidophobist

Elvis died in England.

Louis XIV died in France.

If a pattern occurs with two entities that stand in a relation, then the pattern maps to the relation.

"died in" = diedInPlace

If a meaningful pattern occurs with two entities, then the entities stand in the relation.

diedInPlace

"Elvis"

"England"

Pitfalls of Information ExtractionIf a pattern occurs with two entities that stand in a relation, then the pattern maps to the relation.

SOFIE: A Self-Organizing Framework for Information Extraction

Web page

Reasoning Problem

Elvis died in England.

Taxidophobist

Louis XIV died in France.

"died in" = diedInPlace

If a meaningful pattern occurs with two entities, then the entities stand in the relation.

diedInPlace

"Elvis"

"England"

Pitfalls of Information ExtractionIf a pattern occurs with two entities that stand in a relation, then the pattern maps to the relation.

SOFIE: A Self-Organizing Framework for Information Extraction

Web page

Reasoning Problem

Elvis died in England.

Taxidophobist

Louis XIV died in France.

Disambiguation Problem

"died in" = diedInPlace

If a meaningful pattern occurs with two entities, then the entities stand in the relation.

Pitfalls of Information Extraction

SOFIE: A Self-Organizing Framework for Information Extraction

Pattern Matching Problem

Reasoning Problem

Taxidophobist

Elvis died in England.

Louis XIV died in France.

"died in" = diedInPlace ?

Disambiguation Problem

Information Extraction as Formulas

SOFIE: A Self-Organizing Framework for Information Extraction

Reasoning Problem

Taxidophobist

type(Elvis,Taxidophobist).

type(X,Taxidophobist)

& bornInPlace(X,Y)

=> diedInPlace(X,Z) [0.8]

Information Extraction as Formulas

SOFIE: A Self-Organizing Framework for Information Extraction

Pattern Matching Problem

Reasoning Problem

type(Elvis,Taxidophobist).

Elvis died in England.

type(X,Taxidophobist)

& bornInPlace(X,Y)

=> diedInPlace(X,Z)

Louis XIV died in France.

"died in" = diedInPlace ?

Disambiguation Problem

Information Extraction as Formulas

SOFIE: A Self-Organizing Framework for Information Extraction

Assumptions:

رIn one document, the same word has always the same meaning

رThe ontology already knows all important meanings of proper names

possibleMeaning(Elvis@D15, ElvisPresley). [0.7]

Disambiguation Problem

Information Extraction as Formulas

SOFIE: A Self-Organizing Framework for Information Extraction

Assumptions:

رIn one document, the same word has always the same meaning

رThe ontology already knows all important meanings of proper names

possibleMeaning(Elvis@D15, ElvisPresley). [0.7]

Prior estimation for the likelihood of this meaning.

A word in context (wic).

Here: The word "Elvis" in document D15

| words(D15) ∩ rel(ElvisPresley)|

One possible meaning of "Elvis" as given by the ontology

| words(D15) |

Information Extraction as Formulas

SOFIE: A Self-Organizing Framework for Information Extraction

Assumptions:

رIn one document, the same word has always the same meaning

رThe ontology already knows all important meanings of proper names

possibleMeaning(Elvis@D15, ElvisPresley). [0.7]

possibleMeaning(X,Y) => means(X,Y)

means(X,Y) & YZ => means(X,Z)

Information Extraction as Formulas

SOFIE: A Self-Organizing Framework for Information Extraction

Pattern Matching Problem

Reasoning Problem

type(Elvis,Taxidophobist).

Elvis died in England.

type(X,Taxidophobist)

& bornInPlace(X,Y)

=> diedInPlace(X,Z)

Louis XIV died in France.

"died in" = diedInPlace ?

Disambiguation Problem

meaning(Elvis@D15,

ElvisPresley). [0.7]

Information Extraction as Formulas

SOFIE: A Self-Organizing Framework for Information Extraction

Pattern Matching Problem

occurs("died in",

Elvis@D15,

England@D15). [14]

Elvis died in England.

Louis XIV died in France.

"died in" = diedInPlace ?

occurs(P,Wic1,Wic2) & means(Wic1,X) & means(Wic2,Y) & R(X,Y)

=> mapsTo(P,R)

occurs(P,Wic1,Wic2) & means(Wic1,X) & means(Wic2,Y) & mapsTo(P,R)

=> R(X,Y)

Information Extraction as Formulas

SOFIE: A Self-Organizing Framework for Information Extraction

Pattern Matching Problem

Reasoning Problem

type(Elvis,Taxidophobist).

occurs("died in",

Elvis@D15,

England@D15). [14]

type(X,Taxidophobist)

& bornInPlace(X,Y)

=> diedInPlace(X,Z)

Find truth assignments to hypotheses so that the weight of satisfied formulas is maximized

means(Elvis@D15, ElvisPresley) ?

mapsTo("died In", diedInPlace) ?

diedIn(ElvisPresley, England) ?

Disambiguation Problem

meaning(Elvis@D15,

ElvisPresley). [0.7]

Weighted MAX SAT Problem

SOFIE: A Self-Organizing Framework for Information Extraction

Weighted MAX SAT Problem

Find truth assignments to hypotheses so that the weight of satisfied formulas is maximized

Problems:

رThe Weighted MAX SAT Problem is NP-hard

رOur instance of the problem is huge

ر The most popular linear approximation algorithm (Johnson's)

does not work well with our type of formulas

bornInPlace(X,Y) => bornInPlace(X,Z)

A v B

A v C

B v C

Johnson's cannot approximate better than 2/3

FMS Algorithm

The Functional MAX SAT Algorithm considers only unit clauses.

Formulas

Hypotheses

A v B [w1]

A v B [w2]

B v C [w3]

C [w4]

= false

A

B

C

= false

= true

The Functional MAX SAT Algorithm propagates Dominating Unit Clauses

A v B [10]

A [10]

A [30]

30 > 10+10

A = true

SOFIE: A Self-Organizing Framework for Information Extraction

FMS Algorithm

Polynomial time

FMS Algorithm

FOR i=1 TO 42

...

NEXT i

Approximation Guarantee

Experiments show better performance in practice than Johnson's algorithm in our setting .

SOFIE: A Self-Organizing Framework for Information Extraction

FMS Algorithm

Elvis died in England

r(X,Y) & s(Y) => t(X,Y)

FMS Algorithm

FOR i=1 TO 42

...

NEXT i

SOFIE: A Self-Organizing Framework for Information Extraction

FMS Algorithm

Elvis died in England

r(X,Y) & s(Y) => t(X,Y)

type(Elvis,Taxidophobist)=1

diedIn(Elvis,England)=0

FMS Algorithm

FOR i=1 TO 42

...

NEXT i

means(Elvis@D15,Elvis)=0

means(Elvis@D15,...)=1

diedIn

England

St. Elvis

SOFIE: A Self-Organizing Framework for Information Extraction

FMS Algorithm

r(X,Y) & s(Y) => t(X,Y)

FMS Algorithm

FOR i=1 TO 42

...

NEXT i

diedIn

England

St. Elvis

SOFIE: A Self-Organizing Framework for Information Extraction

SOFIE: A Self-Organizing Framework for Information Extraction

Conclusion

SOFIE unifies the tasks of

رentity disambiguation

رpattern extraction

رsemantic constraint reasoning

in a single framework, delivering

رcanonicalized facts

رof high precision (experiments show 90% precision)

died in England...

but is alive!

SOFIE: A Self-Organizing Framework for Information Extraction

SOFIE rules!

R(X,Y)

/\ R(X,Z)

/\ type(R,function)

=> Y = Z

occurs(P,WX,WY)

/\ refersTo(WX.X)

/\ refersTo(WY,Y)

/\ R(X,Y)

=> expresses(P,R)

occurs(P,WX,WY)

/\ expressed(P,R)

/\ refersTo(WX.X)

/\ refersTo(WY,Y)

/\ range(R,D1)

/\ domain(R,D2)

/\ type(X,D1)

/\ type(Y,D2)

=> R(X,Y)

disambiguationPrior(W,X) => refersTo(W,X)

R(X,Y)

bornInYear(X,B) /\ diedInYear(X,D) => B<D

SOFIE: A Self-Organizing Framework for Information Extraction

SOFIE: A Self-Organizing Framework for Information Extraction

Corpus:

3700 biography documents downloaded from the Web

Goal:

Extract bornIn, bornOnDate, diedIn, diedOnDate, politicianOf

Results: (precision in %)

Runtime: (summed over 5 batches)

Parsing 7:05h

Hypothesis Generation 6:15h

Solving 2:30h

Total 15:50h

87 87 13 98 95

90

bornIn bornOnD diedIn diedOnD polOf

SOFIE: A Self-Organizing Framework for Information Extraction

SOFIE: Relation to Markov Logic

Number of satisfied instances of the ith formula

Weight of the ith formula

r(x,y) /\ s(x,z) => t(x,z) [w]

...

P(X) ~ e sat(i,X) wi

max X e sat(i,X) wi

P

max X log( e sat(i,X) wi )

max X sat(i,X) wi

false true

bornIn(Nicholas, Patras)

~~~~> Weighted MAX SAT problem

SOFIE: A Self-Organizing Framework for Information Extraction

Grounding

SOFIE: A Self-Organizing Framework for Information Extraction

r(X,Y) & s(Y) => t(X,Y)

Immutable, complete facts (e.g. pattern occurrences)

{ r(X,Y), s(Y), t(X,Y) }

r(a,a)

Entities={a,b}

r(a,b)

r(b,a)

r(b,b)

{ r(a,a), s(a), t(a,a) }

{ r(a,b), s(b), t(a,b) }

{ r(b,a), s(a), t(b,a) }

{ r(b,b), s(b), t(b,b) }

Grounding

SOFIE: A Self-Organizing Framework for Information Extraction

r(X,Y) & s(Y) => t(X,Y)

Immutable, complete facts (e.g. pattern occurrences)

{ r(X,Y), s(Y), t(X,Y) }

r(a,a) [w]

r(a,b)

r(b,a)

r(b,b)

{ s(a), t(a,a) } [w]

Grounding

SOFIE: A Self-Organizing Framework for Information Extraction

{ s(a), t(a,a) } [w1]

{p(c,d), q(e), } [w2]

Find truth assignments to hypotheses so that the weight of satisfied formulas is maximized

means(Elvis@D15, ElvisPresley) = true ?

mapsTo("died In", diedInPlace) = true ?

diedIn(ElvisPresley, England) = true ?

Download Presentation

Connecting to Server..