Information inference
1 / 42

Information Inference - PowerPoint PPT Presentation

  • Uploaded on

Information Inference. Mimicking human text-based reasoning. P.D. Bruza & D. Song Information Ecology Project Distributed Systems Technology Centre. Penguin Books U.K. Why Linus chose a penguin. Surfing the Himalayas. Introductory remarks.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Information Inference' - maille

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Information inference

Information Inference

Mimicking human text-based reasoning

P.D. Bruza & D. Song

Information Ecology Project

Distributed Systems Technology Centre

Penguin Books U.K

Why Linus chose a penguin

Surfing the Himalayas

Introductory remarks
Introductory remarks

  • Information inference is a common and real phenomenom

  • It can be modelled by symbolic inference, but this isn’t satisfying

  • The inferences are often latent associations triggered by seeing a word(s) in the context of other words- so inference is not deductive, but about producing appropriate implicit associations appropriate to the context

  • We need to look at the problem from a cognitive perspective….

Since last time
Since last time….

  • (Philosophical) positioning of the work is clearer

  • Some encouraging experimental results using information inference to derive query models

  • Some initial ideas about how information inference fits into an abductive logic for text-based knowledge discovery

Dretske s information content
Dretske’s Information Content

To a person with prior knowledge K, r being F carries the information

that s is G if and only if the conditional probability of s being G

given r is F is 1 (and less than one given K alone)

We can say that s being G is inferred (informationally) from r is F and K

T why linus chose a penguin
T= “Why Linus chose a penguin”

So Dretske’s definition does not permit the inference

“Linus” is “Linus Torvalds”, though a human being may proceed

under this “hasty” judgment.

Dretske’s information content “sets too high a standard”

(Barwise & Seligman)

Inferential information content barwise seligman
Inferential information content (Barwise &Seligman)

To a person with prior knowledge K, r being F carries the information that

s is G, if the person could legitimately infer that s is G from r being F

together with K (but could not from K alone)

T why linus chose a penguin1
T= “Why Linus chose a penguin”

“Linus” being with “penguin” in T, together with K, carries the information that

“Linus” is “Linus Torvalds”

Barwise seligman con t
Barwise & Seligman (con’t)

“… by relativizing information flow to human inference, this definition

makes room for different standards in what sorts of inferences the person

is able and willing to make”


- Psychologistic stance taken

- Onerous from an engineering standpoint: “different standards” implies

“nonmonotonicity”. Consider,

“Linux Online: Why Linus chose a penguin” (willing)


“Why Linus chose a penguin” (not willing)

Consequences of psychologism
Consequences of psychologism

  • Representations of information need not be propositional

  • Semantics is not a model-theoretic issue, but a cognitive one - the “meanings” stored and manipulated by the system should accord with what we have in our heads.

G rdenfors cognitive model











Gärdenfors’ cognitive model

Conceptual spaces the property red
Conceptual spaces: the property “red”





Properties and concepts are dimensional (geometric) objects.

Dimensions may be integral - the value in a dimension(s) determines the

value in another.

G rdenfors cognitive model how we realize it
Gärdenfors’ cognitive model: how we realize it














Geometric representations of words via hyperspace analogue to language hal
Geometric representations of words via Hyperspace Analogue to Language (HAL)

reagan = < administration: 0.45, bill: 0.05, budget: 0.07, house: 0.06, president: 0.83, reagan: 0.21, trade: 0.05, veto: 0.06, … >

This example demonstrates how a word is represented as a weighted vector

Whose dimensions comprise other words.

The weights represent the strengths of association between “reagan”

and other words seen in the same context(s)

How hal vectors are constructed
How HAL vectors are constructed to Language (HAL)

…….Kemp urges Reagan to oppose stock tax…..

Slide a window of width n across corpus

Per word: Compute weight of association with other words within window

the weight is inversely proportional to distance

HAL space: each word in the corpus represented by a multi-dimensional

vector - a weighted sum of the contexts the word appeared in.

(Burgess et al refer to it as a “high dimensional context space”, or a

“high dimensional semantic space”)

Remarks about hal
Remarks about HAL to Language (HAL)

  • A HAL space is easy to construct

  • Cognitive compatibility with human information processing

    • “word representations learned by HAL account for a variety of semantic phenomena” (Burgess et al)

    • Therefore a good candidate for represented “meanings” in accord with our psychologistic stance

  • A HAL space is a real-valued state space, thus opening the door to driving information inference according to Barwise & Seligman’s definition

    • A HAL vector represents a word’s “state” in the context of the text corpus it was derived from

Differences with burgess et al
Differences with Burgess et al. to Language (HAL)

  • We (often) normalize the weights

  • Pre- and post- vectors are added into a single vector

  • HAL vectors derived from small text corpora (e.g., Reuters-21758) seem to be OK

  • HAL vectors are “summed” representations- similar in spirit to “prototypical concepts” (which are averaged representations

Reagan traces
Reagan traces to Language (HAL)

President Reagan was ignorant about much of the Iran arms scandal

Reagan says U.S. to offer missile treaty


Kemp urges Reagan to oppose stock tax

Prototypical concepts
Prototypical concepts to Language (HAL)







Prototypical reagan average of vectors from traces
Prototypical “Reagan” = average of vectors from traces to Language (HAL)

president: 3.23,

administration: 1.82,

trade: 0.40,

budget: 0.37,

veto: 0.34,

bill: 0.31,

congress: 0.31,

tax: 0.29,



Concept combination pink elephant
Concept combination: “Pink Elephant” to Language (HAL)

Elephant = < , , …… >

Heuristic concept combination star wars
Heuristic concept combination: “Star wars” to Language (HAL)

Observation: “star” dominates “wars”

star = <trek: 0.2, episode: 0.05, soviet: 0.3, bush: 0.4, missile: 0.25>

wars = <soviet: 0.1, missile:0.2, iran: 0.33, iraq: 0.28, gulf: 0.4>

starwars = < trek: 0.3, episode: 0.15, soviet: 0.6, bush: 0.53, missile: 0.65,

iran: 0.2, iraq: 0.18, gulf: 0.25>

How to weight dimensions appropriately according to context?

Weights are affected by how one concept appears in the light of another concept:

Intersecting dimensions are emphasized, weights are adjusted according to degree of

dominance. (NB moving prototypical concepts in the HAL space is a cleaner way of

dealing with context)

Theoretical background information inference via hal based information flow computations
Theoretical background: Information inference via HAL-based information flow computations

Barwise&Seligman: state-based “information flow”

HAL-based “information flow”



Degree of inclusion flow computation
Degree of inclusion (flow) computation information flow computations



Consider the “quality properties” above mean weight in the source concept.

(Intuition: how much of the salient aspects of the source are contained in the


Compute the ratio of intersecting dimensions between source and target

concept to the dimensions in the source concept

Visualizing degree of inclusion between hal vectors
Visualizing degree of inclusion between HAL vectors information flow computations


















Many of the above avg.

“quality properties” of the

source concept are

present in the target, so

the degree of inclusion will

be high



Information inference in practice deriving query models
Information Inference in practice: deriving query models information flow computations

  • Construct HAL vectors for all vocabulary terms from the document collection

  • Given a query such as “space program”, compute the information flows from it and use these to expand the query, e.g.

Query expansion term derived via information flow computation

(We used the top 80 information flows for expansion without feedback, 65 with feedback)

The experiments
The experiments information flow computations

  • Associated Press 88/89 collections

  • TREC topics 1 – 50, 100-150, 151-200 (titles only).

  • Models for comparison: Baseline, Composition, Relevance Model, Markov chain model

Baseline model
Baseline Model information flow computations

  • BM-25 term weighting (terms were stemmed)

  • Replication of Lafferty & Zhai’s baseline (SIGIR 2001)

  • Dot product matching function

Composition model
Composition model information flow computations

  • Combine the HAL vectors of individual query terms by recursively applying the concept combination heuristic; query terms ranked according to idf (dominance ranking)

starwars = < trek: 0.3, episode: 0.15, soviet: 0.6, bush: 0.53, missile: 0.65,

iran: 0.2, iraq: 0.18, gulf: 0.25>

Results information flow computations

The effect of information inference
The effect of information inference information flow computations

26% of the 35% improvement in precision of the HAL-based information

flow model is due to information inference

For example, the query “space program”. The information flow model infers

query expansion terms such as “Reagan”, “satellites”,”scientists”,

“pentagon”, “mars”, “moon”.

These are real inferences with respect “space program”, as these terms do

not appear as dimensions in HAL vectors of the concept combination:


Comparison with probabilistic query language models
Comparison with probabilistic query language models information flow computations

  • MC: Markov chain model (Lafferty & Zhai, SIGIR 2001)

Scores are average precision

Comparison with probabilistic query language models con t
Comparison with probabilistic query language models (con’t)

  • RM: Relevance model (Lavrenko & Croft, SIGIR 2001)

Scores are average precision

Text based scientific discovery
Text-based scientific discovery (con’t)


Blood viscosity




Fish Oil


Platelet Aggregation


Vascular Reactivity

“.., he made the connection between these literatures and formulated the hypothesis that

fish oil may be used for treating Raynaud’s disease..”

Weeber et al “Using Concepts in Literature-Based Discovery JASIST 52(7):548-557

Logic of abduction gabbay woods
Logic of Abduction (Gabbay & Woods) (con’t)

Abductive logic

Logic of discovery

Logic of


Hypothesis testing



HAL-based info flow

Raw material for abduction information flows from raynaud
Raw material for abduction? Information flows from “Raynaud”

Raynaud: 1.0

myocardial: 0.56

coronary: 0.54

renal: 0.52

ventricular: 0.52




oil: 0.23


fish: 0.20






Some promise, but lack of representation of

integral dimensions a problem

Index expressions
Index expressions “Raynaud”

“Beneficial effects of fish oil on blood viscosity”









Power index expressions for representing integral dimensions
Power index expressions for representing integral dimensions “Raynaud”

eff of fish oil

eff on blood viscosity






Information flows are single terms, power index expressions determine

how they may be combined into higher order syntactic structures

Initial results from using information flow computations as a logic of discovery
Initial results from using information flow computations as a logic of discovery

27 ventricular (0.52) infarction (0.46) 27 thromboplastin (0.17) 27 pulmonary (0.51) arteries (0.25) 27 placental (0.19) protein (0.42) 27 monoamine (0.17) oxidase (0.18) 27 lupus (0.37) nephritis (0.17) 27 instruments (0.17) 27 coagulant (0.21) 27 blood (0.63) coagulation (0.29) 26 umbilical (0.24) vein (0.32) 25 fish (0.20) 23 viscosity (0.21) 23 cigarette (0.26) smokers (0.22) 4 fish (0.20) oil (0.23)

Summary a logic of discovery

  • (Barwise & Seligman) and Gärdenfors have very stance wrt “human stance” (Gabbay and Woods also)… psychologism is alive….

  • An integration of a primitive approximation of a conceptual space with an information inference mechanism driven by information flow computations

  • An initial attempt towards realizing Gärdenfors’ conceptual spaces

    • A HAL space is only a primitive approximation

    • We are looking at Voronoi tessellations

  • A tiny contribution to Barwise & Seligman’s call for a “distinctively different model of human reasoning”

  • (We are looking beyond IR)