dynamic element retrieval in a structured environment
Download
Skip this Video
Download Presentation
DYNAMIC ELEMENT RETRIEVAL IN A STRUCTURED ENVIRONMENT

Loading in 2 Seconds...

play fullscreen
1 / 21

DYNAMIC ELEMENT RETRIEVAL IN A STRUCTURED ENVIRONMENT - PowerPoint PPT Presentation


  • 169 Views
  • Uploaded on

DYNAMIC ELEMENT RETRIEVAL IN A STRUCTURED ENVIRONMENT. MAYURI UMRANIKAR. CONTENTS . Introduction Retrieval Environment - The Vector Space Model - INEX Environment - Flexible Retrieval System Method Used for Retrieval - Document Tree – Construction

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' DYNAMIC ELEMENT RETRIEVAL IN A STRUCTURED ENVIRONMENT' - mervyn


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
contents
CONTENTS

Introduction

Retrieval Environment

- The Vector Space Model

- INEX Environment

- Flexible Retrieval System

Method Used for Retrieval

- Document Tree – Construction

- Ranking of Elements

- Output

Experiments

Conclusions

introduction
INTRODUCTION
  • Extensible Markup Language (XML) preferred for representing documents and due to increase of documents, issue of element retrieval arises
  • Focus on retrieval of relevant elements rather than entire document
  • INEX – INitiative for Evaluation of XML Retrieval
  • Flexible Mechanisms
  • Different Approaches
  • Term Weighting
retrieval environment
RETRIEVAL ENVIRONMENT
  • 2 Factors – Issues when focus moves from documents to components and Salton’s Vector Space Model
  • Vector Space Model – Weight number of times a term occurs in the document
  • Fox’s Extended Vector Space Model – Incorporation of objective identifiers
  • Document vector consists of subvectors
  • Contain text independently indexed, weighted, searched and retrieved
  • Term Weighting – weighting within subjective vectors
  • Smart Experimental Retrieval System
inex environment
INEX ENVIRONMENT
  • Content Only (CO) –ignore document structure, like typical queries, specify only content of search
  • Content and Structure (CAS) – explicitly refer to structure, exhaustive and specific
  • CO query directly to user, CAS additional filtering and search of body portion
  • CAS returns rank ordered list of elements
  • INEX-EVAL – uses measures of recall and precision

( fig, exhaustivity, specificity mapped to a single relevance)

results are ranked

flexible retrieval system
FLEXIBLE RETRIEVAL SYSTEM
  • Smart Format – documents and topics translated, indexed as extended vectors
  • Subjective vectors – contain content bearing terms
  • Objective vectors – serve as filters on result returned by CAS queries
  • Extended vector – subjective vector, terms having a paragraph in body subvector
  • Lnu-ltu weighting
  • Dynamic flexible retrieval- tree representation, rank ordered list by lnu weights
method for flexible retrieval
METHOD FOR FLEXIBLE RETRIEVAL
  • Input – Query Q given and paragraph, retrieve rank ordered list, terminal modes
  • N top ranked paragraphs as input selected
  • Set of paragraphs used to identify documents – elements generated and returned as output
  • Document Tree – Needs information of structure

Terminal nodes

Pre-order traversal

Terminal nodes found in paragraph index

construction of document tree
CONSTRUCTION OF DOCUMENT TREE
  • For query Q, n top ranked paras used to build trees
  • Leaf elements or terminal nodes - paragraph nodes
  • Each leaf represented by term-freq weighted frequency vector
  • 1st – gather all leaf nodes, terminal nodes done
  • 2nd – merge children vectors for parents
  • Document schema determine merging
  • Parent – unique terms of children, term –freq weighted parent vector( has content of children)
  • Process in recursive manner done
ranking of elements
RANKING OF ELEMENTS
  • Set of elements of document tree generated
  • Problem- structured retrieval; rank ordered list of elements
  • Method used – All-element index( separate representation for each element of each document and weighting information)
  • Lnu weights - elements variable length, do not require global frequency
  • Normalization and length – failing results in biased values
  • Pivot – document length probability of relevance= probability of retrieval
  • Slope- amount of tilting
  • Pivoted Normalization – reduces difference
  • Lnu term weights:

((1+log(term_freq))/ (1+log(avg_term_freq)))/((1-slope)+slope*((no_unique_terms)/pivot)

slide11
Ltu weighting – N collection size, nk no of elements

((1+log(term_freq))/log(N/nk))/

((1-slope)+slope*(no_unique_terms)/pivot))

  • N,nk element dependent, should be known through indexing
  • We move up; N – count elements of each type
  • Nk – inverted file entry in paragraph index, mapping identifiers and xpaths (given)
output of flexible retrieval
OUTPUT OF FLEXIBLE RETRIEVAL
  • Select another leaf node, gather siblings, construct document tree, calculate Lnu term weights, Ltu weighted query; produce another rank ordered list
  • After n top ranked exhausted, last list produced, merge lists
  • Single set of elements rank ordered – correlation Q
  • Comparison – flexible retrieval & all-element index

identical – set of n paragraphs i/p to flexible retrieval have all paragraphs same values used for Lnu-ltu

experiments
EXPERIMENTS
  • Paragraph – result; set of extended vectors representing paragraph
  • CO – subvector represents subjective portion, body subvector important (content of element and not type) contained in body
  • Tree Representation
factors of interest
FACTORS OF INTEREST
  • Slope, pivot for Lnu-ltu
  • Effective structure retrieval
  • Can be determined – empirically, applied from one collection to other; Generic
  • N- no of paragraphs input, sets upper bound on number per query
  • Actual trees depend on number of paragraphs having same group or same document
experiments done
EXPERIMENTS DONE
  • All-element and dynamic/flexible retrieval experiments and results

- body-only retrieval

  • Correlation between element and query vector produced – correlation of body elements only

Table 1

results
RESULTS
  • Tables
slide19
Result equivalent
  • Flexible more efficient – file space

Time required for indexing is half

  • Dynamic- Per query basis cost more – n; total trees not exact required specified
  • Another factor – value of nk
discussions and conclusions
DISCUSSIONS AND CONCLUSIONS
  • Flexible retrieval dynamically, rank ordered list of elements, single indexing at level - basic indexing node (paragraph)
  • Basic functions- SMART; extended vector model
  • Results – flexible capabilities
  • Attempt to incorporate other subvectors, internal node, weight
  • INEX – exhaustivity and specificity; results exhaustive; specificity research going on; results are reflection
  • It is the better way of retrieval than all-indexing
ad