Dynamic element retrieval in a structured environment
Download
1 / 21

DYNAMIC ELEMENT RETRIEVAL IN A STRUCTURED ENVIRONMENT - PowerPoint PPT Presentation


  • 169 Views
  • Uploaded on

DYNAMIC ELEMENT RETRIEVAL IN A STRUCTURED ENVIRONMENT. MAYURI UMRANIKAR. CONTENTS . Introduction Retrieval Environment - The Vector Space Model - INEX Environment - Flexible Retrieval System Method Used for Retrieval - Document Tree – Construction

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' DYNAMIC ELEMENT RETRIEVAL IN A STRUCTURED ENVIRONMENT' - mervyn


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Contents
CONTENTS

Introduction

Retrieval Environment

- The Vector Space Model

- INEX Environment

- Flexible Retrieval System

Method Used for Retrieval

- Document Tree – Construction

- Ranking of Elements

- Output

Experiments

Conclusions


Introduction
INTRODUCTION

  • Extensible Markup Language (XML) preferred for representing documents and due to increase of documents, issue of element retrieval arises

  • Focus on retrieval of relevant elements rather than entire document

  • INEX – INitiative for Evaluation of XML Retrieval

  • Flexible Mechanisms

  • Different Approaches

  • Term Weighting


Retrieval environment
RETRIEVAL ENVIRONMENT

  • 2 Factors – Issues when focus moves from documents to components and Salton’s Vector Space Model

  • Vector Space Model – Weight number of times a term occurs in the document

  • Fox’s Extended Vector Space Model – Incorporation of objective identifiers

  • Document vector consists of subvectors

  • Contain text independently indexed, weighted, searched and retrieved

  • Term Weighting – weighting within subjective vectors

  • Smart Experimental Retrieval System


Inex environment
INEX ENVIRONMENT

  • Content Only (CO) –ignore document structure, like typical queries, specify only content of search

  • Content and Structure (CAS) – explicitly refer to structure, exhaustive and specific

  • CO query directly to user, CAS additional filtering and search of body portion

  • CAS returns rank ordered list of elements

  • INEX-EVAL – uses measures of recall and precision

    ( fig, exhaustivity, specificity mapped to a single relevance)

    results are ranked


Flexible retrieval system
FLEXIBLE RETRIEVAL SYSTEM

  • Smart Format – documents and topics translated, indexed as extended vectors

  • Subjective vectors – contain content bearing terms

  • Objective vectors – serve as filters on result returned by CAS queries

  • Extended vector – subjective vector, terms having a paragraph in body subvector

  • Lnu-ltu weighting

  • Dynamic flexible retrieval- tree representation, rank ordered list by lnu weights


Method for flexible retrieval
METHOD FOR FLEXIBLE RETRIEVAL

  • Input – Query Q given and paragraph, retrieve rank ordered list, terminal modes

  • N top ranked paragraphs as input selected

  • Set of paragraphs used to identify documents – elements generated and returned as output

  • Document Tree – Needs information of structure

    Terminal nodes

    Pre-order traversal

    Terminal nodes found in paragraph index



Construction of document tree
CONSTRUCTION OF DOCUMENT TREE

  • For query Q, n top ranked paras used to build trees

  • Leaf elements or terminal nodes - paragraph nodes

  • Each leaf represented by term-freq weighted frequency vector

  • 1st – gather all leaf nodes, terminal nodes done

  • 2nd – merge children vectors for parents

  • Document schema determine merging

  • Parent – unique terms of children, term –freq weighted parent vector( has content of children)

  • Process in recursive manner done


Ranking of elements
RANKING OF ELEMENTS

  • Set of elements of document tree generated

  • Problem- structured retrieval; rank ordered list of elements

  • Method used – All-element index( separate representation for each element of each document and weighting information)

  • Lnu weights - elements variable length, do not require global frequency

  • Normalization and length – failing results in biased values

  • Pivot – document length probability of relevance= probability of retrieval

  • Slope- amount of tilting

  • Pivoted Normalization – reduces difference

  • Lnu term weights:

    ((1+log(term_freq))/ (1+log(avg_term_freq)))/((1-slope)+slope*((no_unique_terms)/pivot)


  • Ltu weighting – N collection size, nk no of elements

    ((1+log(term_freq))/log(N/nk))/

    ((1-slope)+slope*(no_unique_terms)/pivot))

  • N,nk element dependent, should be known through indexing

  • We move up; N – count elements of each type

  • Nk – inverted file entry in paragraph index, mapping identifiers and xpaths (given)


Output of flexible retrieval
OUTPUT OF FLEXIBLE RETRIEVAL

  • Select another leaf node, gather siblings, construct document tree, calculate Lnu term weights, Ltu weighted query; produce another rank ordered list

  • After n top ranked exhausted, last list produced, merge lists

  • Single set of elements rank ordered – correlation Q

  • Comparison – flexible retrieval & all-element index

    identical – set of n paragraphs i/p to flexible retrieval have all paragraphs same values used for Lnu-ltu



Experiments
EXPERIMENTS

  • Paragraph – result; set of extended vectors representing paragraph

  • CO – subvector represents subjective portion, body subvector important (content of element and not type) contained in body

  • Tree Representation


Factors of interest
FACTORS OF INTEREST

  • Slope, pivot for Lnu-ltu

  • Effective structure retrieval

  • Can be determined – empirically, applied from one collection to other; Generic

  • N- no of paragraphs input, sets upper bound on number per query

  • Actual trees depend on number of paragraphs having same group or same document


Experiments done
EXPERIMENTS DONE

  • All-element and dynamic/flexible retrieval experiments and results

    - body-only retrieval

  • Correlation between element and query vector produced – correlation of body elements only

    Table 1


Results
RESULTS

  • Tables


  • Result equivalent

  • Flexible more efficient – file space

    Time required for indexing is half

  • Dynamic- Per query basis cost more – n; total trees not exact required specified

  • Another factor – value of nk


Discussions and conclusions
DISCUSSIONS AND CONCLUSIONS

  • Flexible retrieval dynamically, rank ordered list of elements, single indexing at level - basic indexing node (paragraph)

  • Basic functions- SMART; extended vector model

  • Results – flexible capabilities

  • Attempt to incorporate other subvectors, internal node, weight

  • INEX – exhaustivity and specificity; results exhaustive; specificity research going on; results are reflection

  • It is the better way of retrieval than all-indexing



ad