xml keyword search refinement n.
Skip this Video
Loading SlideShow in 5 Seconds..
XML Keyword Search Refinement PowerPoint Presentation
Download Presentation
XML Keyword Search Refinement

Loading in 2 Seconds...

play fullscreen
1 / 32

XML Keyword Search Refinement - PowerPoint PPT Presentation

  • Uploaded on

XML Keyword Search Refinement. 郭青松. Outline. Introduction Query Refinement in Traditional IR XML Keyword Query Refinement My work. Why we need query refinement?. User express their query intention by keywords, but their don’t know how to formulate good query Lack of experience

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'XML Keyword Search Refinement' - kerry-duran

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
  • Introduction
  • Query Refinement in Traditional IR
  • XML Keyword Query Refinement
  • My work
why we need query refinement
Why we need query refinement?
  • User express their query intention by keywords, but their don’t know how to formulate good query
    • Lack of experience
    • Too many expression forms
    • Unfamiliar with the system
    • Have no idea about the data
  • „Query Refinement
    • Refine the query and get good results
what is query refinement
What is Query Refinement?
  • Query expansion(query reformulation)
  • Given an ill-formed query from the user, we refine the query and help the user to better retrieve documents.
  • The goal is to improve precisionand/or recall.
  • Example:
    • “cars” “car”, “automobile”, “auto”
xml search
XML Search
  • Tag + Keyword search
    • book: xml
  • Path Expression + Keyword search (CAS Queries)
    • /book[./title about “xml db”]
  • Structure query
    • XPath, XQuery
  • Keyword search (CO Queries)
    • “xml”
xml keywords search vs ir
XML Keywords Search VS IR
  • IR
    • Flat HTML pages
    • Whole page returned
  • XML
    • Model(tree、graph)
    • Structural(semi-structural)
    • Semantic-based query(LCA, SLCA…)
    • Information fragment returned
need of xml keyword query refinement
Need of XML Keyword Query Refinement
  • Hard to know the XML content
    • Especially big xml document
    • Information fragments(LCA\SLCA)
    • Easily affect the results(Precision )
    • Huge difference of query results
  • IR style refinement methods is not suitable for xml
    • Only content be considered
    • Need structure information to form a good query
  • Introduction
  • Query Refinement in Traditional IR
  • XML Keyword Query Refinement
  • My work
  • Spelling Correction
  • Word Splitting/Word Merging
  • Phrase Segmentation
  • Word Stemming
  • Acronym Expansion
  • Add/Delete Terms
  • Substitution
classes of query refinement
Classes of Query Refinement
  • Relevance feedback
    • Users mark documents(relevant, nonrelevant)
    • Reweight the terms in the query
  • Automatic query Refinement
    • System analysis the relevance of documents and query, give refined query automatically
    • Global analysis
    • Local analysis
relevance feedback
Relevance Feedback
  • Began in the 1960s
  • Improvement in recall and precision
  • Basic process as follows
    • The user issues their initial query
    • The system returns an initial result set.
    • The user then marks some returned documents as relevant or nonrelevant.
    • The system then re-weights the terms and refine the query results
relevance feedback models
Relevance Feedback Models
  • Boolean.
    • Terms appear in document: relevance
  • Vector Space.
    • q=(t1, t2,…, tn) d=(w1, w2,…, wn)
  • Probabilistic.
    • Relevance of a query and documents evaluate as probability
    • Probabilistic ranking principle
rocchio algorithm for vector space model
Rocchio algorithm for vector-space model
  • qm :refined query vector
  • q0: the original query vector
  • Dr :relevantdocuments, Dnr: nonrelevant documents
  • α, β, γ: weights attached to each term

Average relevant- document vector

Average non-relevant document vector

global analysis 1
  • Using all documents to compute the similarity of query q and terms in the documents
  • Similarity Thesaurus based
global analysis 2

Similarity of terms



Select r terms with highest sim value and adding into initial query , reformulate the new query

local analysis
Local analysis
  • Local analysis: Using initial query results(especially documents front ,local documents) to refine the query
  • Local clustering
    • Clustering the term of local documents
    • Query refined with the relevant cluster
    • Similarity of terms in query and terms in documents
  • Local context analysis(LCA)
    • Get the most similar term in local documents with the query q to expanse
    • Similarity of q and terms in documents

Company name

  • Introduction
  • Query Refinement in Traditional IR
  • XML Keyword Query Refinement
  • My work
xml refinement manner 1
XML Refinement Manner(1)
  • Query refined form
    • Keywords query  New Keywords Query
      • Treat as traditional IR problem
      • IR with XML Keyword search Semantics
    • Keywords  Structural Query
  • User participant
    • Manually(User Interactive )
      • Structural Feedback
    • Automatic

Company name

xml refinement manner 2
XML Refinement Manner (2)
  • Manually Refined to new Keywords Query
    • IR(consider the structure of xml)
  • Manually Transform to Structural Query
    • Relevance Feedback
  • Automatic Refined to new Keywords Query
    • Lu jiaheng:
    • AutomaticTransform to Structural Query
    • NLP
automatic refined to new keywords query 1
Automatic Refined to new Keywords Query(1)
  • Query  Refined Query
    • Rule based
  • Operation
    • Term merging:
    • Term splitting:
    • Term substitution:
    • Term deletion
automatic refined to new keywords query 2
Automatic Refined to new Keywords Query(2)
  • Ranking Refined query candidates set S(RQ)
  • Refinement cost
    • Cost: the step of “op” from “Q” to “RQ”
    • Dynamic programming
  • Efficient Refinement Algorithms
    • Avoid the multiple scan invert list
    • stack-based ,stack-based, short-list-eager approach
  • RQ candidates have the same refinement cost
    • Q={XML, Jim, 2001}{XML, 2001}, {Jim, 2001} or {XML, Jim}
  • Natural Language Query (NLQ)  NEXI
  • NEXI(Narrowed Extended XPath I)
    • //A[about(//B,C)]
    • A: path expression,
    • B :relative path expression to A
    • C is the content requirement.
    • ‘about’ clause represents an individual information request.
nlpx lexical and semantic tagging
NLPX—Lexical and Semantic Tagging
  • structural words: content requirements
  • boundary words: Path expression
  • instruction words
    • R :return request , S :support request.

Find sections about compression in

articles about information retrieval

Tagged: Find/XIN sections/XST about/XBD

compression/NN in/IN articles/XST about/XBD information/NN retrieval/NN

nlpx template matching
NLPX—Template Matching
  • most queries correspond to a small set of patterns
  • formulate grammar templates with patterns

Query: Request+

Request : CO_Request | CAS_Request

CO_Request: NounPhrase+

CAS_Request: SupportRequest | ReturnRequest

SupportRequest: Structure [Bound] NounPhrase+

ReturnRequest: Instruction Structure [Bound] NounPhrase+

Grammar Templates

Request 1 Request 2

Structural: /article/sec /articlec

Content: compression information retrieval

Instruction: R S

Information Requests

nlpx nexi query production
NLPX—NEXI Query Production
  • merge the information request into NEXI query.
  • A[about(.,C)]
    • A :the request structural attribute and
    • C : the request content attribute.

//article[about(.,information retrieval)]//sec[about (.,compression)]

query generation process
Query generation process
  • Create target component
    • Break up the query into units
  • Generate initial target
    • combinations of input target components
  • Generate queries
    • modifying a target component
    • combing two components
  • Breaks up the input query into terms
    • Structure( XML tags or attributes)
    • Content term(refer to text)
  • Create component
    • Structure term  unbound target
    • Content term  binding to a bound target
  • Probability enumeration
target component and target sets
Target component and target sets

Query: Papers by jennifer widom

  • {//author[~’jennifer widom’]} 0.6842
  • {//editor[~’jennifer widom’]} 0.3150
  • {//title[~’jennifer widom’]} 0.0004
  • {//article} {//author[∼‘jennifer widom’]} 0.3421
  • {//inproceedings} {//author[∼‘jennifer widom’]} 0.3421
  • {//inproceedings} {//editor[∼‘jennifer widom’]} 0.1577
  • {//article} {//editor[∼‘jennifer widom’]} 0.1577
  • {//inproceedings} {//title[∼‘jennifer widom’]} 0.0002
  • {//article} {//title[∼‘jennifer widom’]} 0.0002
  • papers
  • {//article} 0.5000
  • {//inproceedings} 0.5000
  • Jennifer widom
transformation operators 1
Transformation Operators(1)
  • Aggregation: merge targets with same tag
    • {//a}, {//a[~’x’]}  {//a[~’x’]}
    • {//a[~’x’]} , {//a[~’y’]}  {//a[~’x y’]}
    • Prefix expansion: add an ancestor condition
    • {//b}  {//a//b}
    • {//b[~’x’]}  {//a//b[~’x’]}
  • Ordering: combine targets
    • {//a}, {//b}  {//a//b} or {//a[//b]}
    • {//a}, {//b[~’x’]}  {//a//b[~’x’]} or {//a[//b[~’x’]]}
  • Two stronger assumption
    • Keyword query non-ambiguity
    • Availability of XML thesaurus
  • Accuracy:
    • terms classification didn’t consider specific XML context
  • Time costly:
    • Term classification
    • Targets create scan the XML documents
  • Introduction
  • Query Refinement in Traditional IR
  • XML Keyword Query Refinement
  • My work

Thank You !