slide1 n.
Skip this Video
Loading SlideShow in 5 Seconds..
Wolf Siberski PowerPoint Presentation
Download Presentation
Wolf Siberski

Loading in 2 Seconds...

play fullscreen
1 / 24

Wolf Siberski - PowerPoint PPT Presentation

  • Uploaded on

What do you mean? – Determining the Intent of Keyword Queries on Structured Data. Wolf Siberski. Overview. Motivation Approaches in keyword search on structured data QUICK – Query Intent Construction for Keywords User interaction Algorithm Evaluation Conclusion.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Wolf Siberski' - kimn

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

What do you mean? – Determining the Intent of Keyword Queries on Structured Data

Wolf Siberski

  • Motivation
  • Approaches in keyword search on structured data
  • QUICK – Query Intent Construction for Keywords
    • User interaction
    • Algorithm
    • Evaluation
  • Conclusion
the information search process
The Information Search Process


Whatexactly do I wanttoknow?


How do I express mysearchrequest?

Sutcliffe/Ennis: Towards a cognitive theory of information retrieval

imdb example keyword search
IMDB Example – Keyword search

Have they been working together?

Brad Pitt

Angelina Jolie

In which movies did they both act?

Brad Pitt Angelina Jolie

IMDb Brad Pitt Angelina Jolie

imdb example database search
IMDB Example – Database search

Are they working together, too?

Brad Pitt

Angelina Jolie

In which movies did they both act?

SELECT M.Title, M.Year FROM Movie M, Actor A1, Actor A2, ActsIn R1, ActsIn R2 WHERE

A1.Name = 'Brad Pitt' AND A2.Name = 'Angelina Jolie' AND R1.ActorId = A1.Id AND R2.ActorId = A2.Id AND

R1.MovieId = R2.MovieId AND M.Id = R1.MovieId

  • Trend: general information captured as structured data (DBpedia, LinkedData, etc.)
  • Limited support for complex information needs
    • Keywords: Limited expressivity, but user-friendly
    • Structured Queries: High expressivity, but difficult to master

 New ways to access this data required

ir on structured data incomplete
IR on Structured Data (Incomplete)
  • Not a newidea (Universal Relation, 1984)
  • Relevance Notion forstructureddata
    • Extractdatasubgraphs (tuplejoins) matchingthequery
    • Rank resultsaccordingtorelevance score
    • Can servethe ‚head‘ ofuserdistribution, but not thelongtail
    • Low qualityofrelevancejudgements [Coffmann/Weaver, CIKM10]
  • Form builder
    • Enablevisualconstructionofuser-definedqueryforms
    • Requiresexplorationofdatabaseschema
quick keyword search on databases
QUICK – Keyword Search on Databases
  • User startswithkeywordsearch
  • QUICK guidesuserthroughqueryconstructionprocess
  • Combines
    • Ease-of-useofkeywordsearch
    • Expressivityofdatabasequeries

G. Zenz, X. Zhou, E. Minack, W. Siberski, and W. Nejdl:From keywords to semantic queries – Incremental query construction on the semantic web. Journal of Web Semantics, Elsevier, 2009.

quick search process
QUICK Search Process

Brad Pitt Angelina Jolie




Is “Brad” part of a movie title?

Is “Brad” part of an actor name?

Compute possible query intentions

Compute selection options

Selection options

Select intended interpretation

“Brad” is part of an actor name

Refined Interpretation

Find movieswhereboth Brad Pitt and Angelina Jolieareactors


101 BiggestCe… 2004

Mr. & Mrs. Smith 2005

Stars on Trial 2005

Select intended query


Compute results


Evaluate results

quick concepts
QUICK – Concepts
  • RDF Schema
  • Query Template
    • Query pattern on the schema
    • Contains only free variables
  • Semantic Query
    • Interpretation of a keyword query
    • Produced from query template by binding keywords
query guide
Query Guide
  • Query Hierarchy
    • Semantic queries ordered by sub-query relationship
  • Query Guide
    • Graph including paths to all possible queries
query guide construction offline stage
Query Guide Construction – Offline Stage
  • Generate all Query Templates
    • Start with one-variable queries
    • Produce all possible combinations
    • Repeat until max. join path length reached
  • Build Inverted Index
    • Terms -> Attributes
    • Enables fast keyword-query mapping at runtime
query guide construction online stage
Query Guide Construction – Online Stage
  • Identify possible queries (leafs of query guide)
  • Extract partial query graph from template graph
    • Problem: query space can be very large

 Find minimal query guide

  • Cost function: # of steps+ # of inspected suggestions
    • Minimal guide: smallest maximum cost
  • Depth/width tradeoff:

Too flat

Too deep


ln(n) split

greedy query guide construction
Greedy Query Guide Construction
  • Finding Minimal Guide: NP-Hard 
  • Use approach similar to set cover approximation
    • Determine nodes (=refinement options) top-down
    • Greedily select node leading to the lowest cost
      • Cost estimation: minimally incurred cost
    • Repeat until all nodes are covered
evaluation experiment settings
Evaluation – Experiment Settings
  • IMDB database
    • Semantic Web representation
  • Queries from AOL query log
    • Selection criteria
      • Movie-related
      • 2-5 keywords
      • Refers to at least 2 entities
    • Manual assessment of query intention
  • Search process
    • Manual input of keywords
    • Selection of correct option according to query intention
evaluation guide quality
Evaluation – Guide Quality
  • Intended construction option usually among top 3
  • Usually 3-5 clicks needed to construct query
  • Effective also for large query spaces
  • Query construction with QUICK
    • Highly effective construction process
    • All intentions can be constructed
    • No query language or schema knowledge required
  • Further directions
    • Combine with relevance heuristics (IQP)
    • More flexible user interaction
      • Use facets for keyword bindings
      • Better multi term support
    • Optimized query guide generation
      • Exploit entity notion (QUnits)
      • Progressive query guide creation
    • Connect to QbE/Query Form Creation
evaluation performance
Evaluation – Performance
  • Initialization takes too much time for long queries
    • RDF store as bottleneck (creation of query hierarchy)
  • After initialization, response time is ok
  • Identification of semantic queries
    • Index template subsets by attribute to enable fast filtering of queries without results
    • Enable fast disjunction of template subsets (e.g., ‚and on bitsets)
  • QCG generation
    • Parallel subquery computation
    • Caching of frequent subqueries
misc ideas
Misc Ideas
  • Use Google‘s KDD annotated Named Entity Recognition test set (Piggyback,
cross connections
Cross Connections
  • Thomas Gottron: Traditional features (e.g. TF) not useful for very short text
  • Hinrich Schütze: entity related queries often ambigouous
  • Michael Granitzer: cycle of refinement/exploration
  • Norbert Fuhr: generate clusters based on possible queries and let users select the right cluster