Application of Markov chains in an interactive information retrieval system

Application of Markov chains in an interactive information retrieval system A brief introduction for people knowledgeable of Markov chains, but not information retrieval, and vice versa.

Scenario • Need info to perform a task, but are uncertain about the utility of the documents ... • Guidance from experienced people would be helpful! • Information Retrieval (IR) system presents lots of documents, but which suite of documents would be helpful? • From an IR perspective, we’d have to design an IR system to retrieve relevant documents and gather experienced users’ preferences over time to establish the probability of a suite of associated documents.

Scenario • Today’s talk is about the feasibility of modeling and building just such a system. Is it feasible to model and build a probabilistic interactive retrieval system? Audience: Given that the audience either knows about Markov chains but not about IR, or about IR and not Markov chains, I will present only the basics of the project. It may be useful first to define both IR and Markov chains and then discuss their intersection.

The Problem In IR the goal is to retrieve the most relevant documents in response to a user’s query and then present them in a comprehendible way. The problem is that IR systems rely on • a semantic, or surface, level model of language without any contextualization, • ignores how the language is actually used (the pragmatic entailments), • the domain of use by real people, and • how people think during information seeking sessions.

A potential solution • There are many avenues of research • Group awareness, Query chains, Interactive visualzation • This sounds like a lot: to make things clearer for this audience, I’ll present the usual models of IR and then explain the project.

Taxonomy of IR models [Classic Models] Boolean Vector Probabilistic [Set Theoretic] Fuzzy set theoretic Extended Boolean Retrieval (user knows what he wants) [Algebraic] Generalized vector Latent Semantic Indexing Neural Networks Genetic Algorithms Genetic Programming [Structured Models] Non-overlapping lists Proximal nodes [Probabilistic] Inference Networks Belief Netowrks Browsing (user not sure) [Browsing] Flat files Structured guided Hypertext

Information Retrieval (IR) • Focuses on the • Representation, • Storage of, • Organization to, and • Access to information sources. The emphasis in IR is trying to locate and present data that are useful to people, that is, information.

IR • IR is mostly associated with “full-text retrieval” • Also focuses on “indexing and searching” of records, users and interface design, and novel representations of data, provided they help the user interpret the retrieval set. • Since IR encompasses so many types of computer-based files, IR sees documents, users, and information from an abstract point of view, leaving researchers & developers to create their own implementation of the abstract model.

IR defined An information retrieval model is a quadruple D, Q, F, R(qi, dj) where D = the “logical view” of the document collection Q = the “logical view” of the user’s queries F = a framework for matching D and Q and R(qi, dj) = a mathematical function to rank retrieved individual documents (dj) from the collection to the user’s query (qi)

IR example • One example of IR is any Internet search engine. • For instance, have you ever wondered what happens to your search terms when you send them to an Internet search engine? • Why are some documents presented to you in the retrieval set and others are not? • Why are the retrieved documents listed (or ranked) as they are?

The IR Model • D [document collection] can be full text documents, documents with library cataloguing records, html documents … pretty much anything! • Q [query] is the “user’s expression of information need.” Since everyone expresses himself differently from others, queries are the least stable part of IR.

The IR Model • F [framework] is typically the computing environment • R [ranking] is how retrieved elements are associated with the user’s query and with each other

File parsing • Before IR can occur, the document must be parsed. • Using the example of a full-text document: • The file is opened by the computer program • Each term, one by one, is examined by the program: • Is this term a “stop term” (e.g., a term to be skipped) • Is this term very common (e.g., “the”, “an”)

File parsing • The term is stored in a database along with the frequency of term’s occurrence either within the individual document or within the entire document collection • Optionally: terms may be “stemmed” - that is, the grammatical endings are removed (e.g., “fishes” becomes “fish”; “goes” becomes “go”). In English, the usual stemming technique is the “Porter stemming algorithm.”

File parsing • When complete, the program calculates a “weight” for each term which is stored in a “term/document matrix”. The matrix looks like a spreadsheet! • The weight may be based on normalized frequency (a comparable weight value is calculated based on the size of the document) or something else. • Usually, the weight is calculated based on the famous “idf•tf” [inverse document frequency/term frequency]

File parsing • The idea is that rarely-occuring terms have more “informational value” and so should be weighted to cause documents with those terms to rank highly in the retrieval set. • Terms that occur very frequency or very rarely rank lower. • User-oriented techniques for interacting with the IR system; graphically or term-based, e.g., using boolean operators (“and”, “or”, “not”) in the query helps the user manipulate the weights to get a useful retrieval set.

Document/Term Frequency Matrix 1 RAW COUNTS: the actual number of times the term appears in each document.

Document/Term Frequency Matrix 2 NORMALIZED FREQUENCIES: calculate the number of times the term appears based on the normalized frequency (overcomes the different document lengths). First step towards usual idf·tw (inverse document frequency•term weighting).

Related work • Markov (Anderson ‘91, Asmussen 87; Jackson & Lafrere 1998) • AI (Zhang 2001) • Paterman (1990), Cassandra (1998), Rajgopal & Mazumdar (2002); Chen & Cooper (2002) -stochastic modeling of use; statistical inference • Danilowicz & Balinski (2001) vector/idf•tw

Related work • “If query terms have multiple senses, a mixture of these senses may be present in the expanded model. For semantic smoothing, a more content-dependent model that takes into account the relationship between query terms may be desirable. One way to accomplish this is through a pseudo-feedback mechanism ... In this way the expanded language model may be more ‘semantically coherent,’ capturing the topic implicit in the set of documents rather than representing words related to the query terms qi in general.” (Lafferty & Zhai, 2001, p. 15)

In short... We have ... Semantic-level parsing of documents IDF•TW for relevance-ranked retrieval sets Hierarchical lists & graphic displays We ask ... What type of visualization? How to incorporate group awareness & query chains? How to build on what the IR model offers?

IR as a Markov Process • The randomness of query term • Modeling the Markov process • Incorporating group/previous users’ input • Present the whole through an interactive information retrieval systems’ interface

Markov chain defined The behavior of an informationally closed and generative system that is specified by transition probabilitiels between that system’s states. Named after A. A. Markov who studied stochastic sequences of characters (symbols, letters, words). Probabilities of a Markov chain are entered in a transition matrix indicating which state or symbol follows which other state or symbol.

Modeling with Markov chains Consider the following example of a homogeneous stochastic process with discrete time and finite state space. The physical model of IR permits a number of terms and allows users to move from one set of terms to another at arbitrary time points. For our purposes, we identify the set of possible terms with a finite set of states S = {1, ..., m}. Feedback from the end-user of the system causes the retrieval system to jump from one state into another and to recreate the retrieval set’s members’ association. Furthermore such transitions may take place only at certain instants [feedback inputs from the end-user] of a discrete time unit. ...

Modeling with Markov chains Using idf·tw as a starting point, we’re able to estimate the (hypothetical) probabilities pij, , for a transition from state i to state j. These probabilities do not depend on or vary with the time n. For the complete specification of the stochastic process , where xn is the state of the system at time point n, we need to provide a distribution of the initial state x0, which is denoted by P0 = (P01, ..., P0m). Here Poi = P[X0=i] denotes the probability of a start in state i, . There’s no risk of confusion between the initial probabilities p0i and the transition probabilities pij since we index with natural numbers.

Modeling with Markov chains The actual state Xn may not be useful. So we consider a function f: SR where the value f() expresses a property of the system which can be measured (the “observables”). As a natural extension we consider observables which are defined on the set of all possible s-tuples of successive states of the chain, where . In the case of IR, it’s appropriate to work with observables which depend on pairs (query/query representations) of states, where i is the state of departure and j is the destination of the transition which takes place between time points n and n+1. We may, then, consider the query terms. This observable depends on pairs (Xn,Xn+1) of successive states.

Randomness of query terms • Query represents seeker’s semantic representation of a concept, e.g., “Ford, car, auto, vehicle” • Finite set of terms; without other considerations, have an equal chance of being selected; • Can know where a user is in the chain; predict next choices ...

After selecting a term and trying again, the probabilities of a term being selected changes ... Over time, we capture user choices to seed the probability of a term being selected A state vector gives the probability of each state i. A transition matrix of query terms Q can be made of the entries that reflect the transition probabilities mij The probability of a given term being selected the first time, Probability of the term being selected the second time, And so on... initial 2nd 3rd

If potential terms were numbered from 1...m, the order of the queries is described by some permutation (i1, i2, ... im) The probability of each terms being selected: let pkbe the probability of term k being selected Ex: if there were 2 possible states, m=2, then there are only two possible states: c1 = (1,2) and c2 = (2,1). Probabilities are p11 = p21 = p1; p12 = p22 = p2.

Model description • System side: • In relevance feedback systems, recalculate relevance ranking • Can calculate probability that control passes from node i to node j, at different times or states • Transition probabilities are used to reflect the IR system’s relevancy ranking ...

Group awareness and Best choice • Recall from a closed set of terms, the individual info seeker may select with equal probability any term... • IR systems usually add weights • Previous input by a group of experienced users may also add weights ... Probabilities of how experienced users move from tn to tm ... tx

Compare Individual terms; each has equal probability of selection Group input provides weighting factor

Example User interested in financial work, uses terms {budget, payroll, faculty} Doc 1 = {t1, t2, t3, ... tn} Doc 2 = {t1, t2, t3} Doc 3 = {t3, t5, t9} Each term has 33% chance With additional input, t23 = .6 has a 60% chance of becoming Higher ranked after one user interaction.

Interface over matrix

Discussion • Markov model parallels IR ideas of • Relevancy ranking • Uses what system offers: semantic tokens • Uses transition probabilities same as relevancy feedback systems’ “more like these” • Integrates group awareness as weighting scheme • Provides data on group & individual heuristics

Discussion • Potential uses • Can be used to recommend search paths • In group settings: may encourage confidence and certainty • System can react to/prevent seeker going too far astray

Discussion • Potential uses • Provides data about probable relationships that can be incorporated into larger IR systems • Combined with class-relationship approach, prob. data compensates for missing data in the object definition (Asmusssen, 2000) • Probabilistic distribution of the sum of classes (Conniffe & Spencer, 2000)

Future research • Test end-user confidence • Compare idf·tw to term weights generated by group awareness • 2D and 3D interfaces

Application of Markov chains in an interactive information retrieval system

Application of Markov chains in an interactive information retrieval system

Presentation Transcript

An Introduction to Markov Chains

NewsBoy: an interactive news retrieval system

Markov Chains

Markov Chains

Markov Chains

Markov Chains

Markov Chains

An Introduction to Markov Chains

Markov Chains

Markov Chains

Markov chains

Markov chains

Markov Chains

Overview of Markov chains

Markov Chains Regular Markov Chains Absorbing Markov Chains

Application of NLP in Information Retrieval

Markov Chains

Markov Chains

Markov Chains

Markov Chains

Markov chains

Markov Chains