Loading in 2 Seconds...
Loading in 2 Seconds...
An Overview of Bayesian Network-based Retrieval Models. Juan Manuel Fernández Luna Departamento de Informática Universidad de Jaén firstname.lastname@example.org. Department of Computing Science, University of Glasgow October, 21 th - 2002. Layout. Introduction Introduction to Belief Networks
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Juan Manuel FernándezLuna
Departamento de Informática
Universidad de Jaén
Department of Computing Science, University of Glasgow
October, 21th - 2002
A) different representations that a concept may have, B) these concepts are not independent among them.
Information Retrieval Uncertain process
Probabilistic models tried to overcome these problems…
Researchers focused their attention on Belief networks in order to apply them to IR because:
They show a high performance in actual problems characterised by uncertainty.
Belief Network Bayesian Network
(Conditional probability distributions)
Taking into account these (in)dependences, the joint probability distribution could be restored from the network:
Pa(Xi) being the set of parents of the variable Xi.
This previous expression implies an important saving in the storage space.
Inference Network Model
Instantiating each document, dj, and computing p(inn | dj).
Guidelines to build the BNR Model:
But... If a document has been indexed by 30 terms, we need to estimate and store 230 probabilities.
pa(Dj) being a configuration of the parents of Dj.
Great amount of nodes and existing cycles in the graph
General purpose propagation algorithms can´t be applied due to efficiency considerations.
Taking advantage of:
Propagation is substituted by
Evaluation of the probability function
in each document node
Result: An efficient and exact propagation.
Including Query term frequencies:
Removing the term independency restricction:
Term subnetwork Polytree
There is a set of efficient learning and propagation algorithms available for this topology.
Marginal Distributions (root term nodes):
(M being the number of terms in the collection)
Conditional Distributions (term nodes with parents):
(based on Jaccard´s coefficient)
Conditional Distributions (document nodes):
But... Due to the complexity of the whole network we can not run an exact propagation algorithm.
PROPAGATION + EVALUATION
Running the exact Pearl´s propagation algorithm in the polytree (term subnetwork), p(ti|Q), Ti, are computed.
Evaluation of a probability function in the Document Subnetwork, computing p(dj|Q), Dj, incorporating p(ti|Q).
Adding document relationships
Given a document, Dj:
Advantages of this topology:
1. Representing only the best term relationships
Retrieval effectiveness could be damaged
2. Modifying Pearl´s algorithm.
In large polytrees, the belief of a great number of terms, those furthest from query terms, will not be updated after propagating.
So...Why is the propagation
algorithm still running?
3. Changing the Term Subnetwork topology.
In certain cases, the polytree topology of the Term subnetwork, even using the term selection approach, could not be very appropriate.
An alternative topology:
Two term layers
Thank you very much