Uncovering Semantics through Structured Queries in Information Retrieval

Exploiting Semantics with Structured Queries Jose Ramón Pérez-Agüera & Hugo Zaragoza U. Complutense de Madrid Yahoo! Research (Barcelona) CLEF 2008

Query expansion makes term independance a big issue… we are double counting “meanings” !!! CLEF 2008

Term independance assumption gets worse with query expansion… (example 1) Verde que te quiero verde.Verde viento. Verdes ramas.El barco sobre la mary el caballo en la montaña.Con la sombra en la cinturaella sueña en su baranda verde carne, peloverde, con ojos de fría plata. Bajo la luna gitana, las cosas la están mirando y ella no puede mirarlas. […] verde3 que te quiero verde2.verde3 viento. verde1 ramas.El barco sobre la mary el caballo en la montaña.Con la sombra en la cinturaella sueña en su baranda verde5 carne, peloverde1, con ojos de fría plata. Bajo la luna gitana, las cosas la están mirando y ella no puede mirarlas. […] q:verdepelo [CLEF EFE94, 2001 Spanish topics] q1:verde1pelo q2:verde1 verde2 pelo q2:verde1 verde2 verde3 verde4 verde5 pelo CLEF 2008

Term independance assumption gets worse with query expansion… (example 2) - 46% !!! [CLEF EFE94, 2001 Spanish topics] [Pérez-Agüera , Zaragoza and Araujo, NLDB 2008] CLEF 2008

Term independance assumption gets worse with query expansion… (example 3) • BM25 dependance model: tf = 1 2 3 4 … 10 CLEF 2008

Query Expansion (example of state of the art) • Term Selection: • Divergence From Randomness Expansion Model (DFR) Bo1 Model [8,6]: • Term Weighting: • Rochio [9]: P(term) tf in top x=1 document top 40 terms document • Perf. Prediction: • AvICTF [5] (cheap) > 9.0 0.3 CLEF 2008

Results in CLEF 2008 Robust-WSD Task: • Standard Query Expansion: • 3rd team in CLEF Robust out of 8. 1st team well ahead of everyone. • It seems no one improved GMAP so they reported MAP  CLEF 2008

Query expansion makes term independance a big issue… we are double counting “meanings” !!! CLEF 2008

Query Clauses Idea: “Cheap Barcelona Italian Restaurants” {cheap, barcelona, italian, restaurant } Expansion: {cheap, barcelona, italian, restaurant, inexpensive, affordable, Sagrada Familia, Ramblas, Gràcia, Barceloneta, pizzeria, trattoria, café } Strcuture: collect related meanings in clauses { {cheap, inexpensive, affordable}, {Barcelona, Sagrada Familia, Ramblas, Gràcia, Barceloneta, …}, {Italian_restaurant, pizzeria, trattoria, café} } c1 c2 c3 Clause independance, not term independance CLEF 2008

Query Clauses Idea term 1 term 2 term e1 CLEF 2008

Query Clauses Idea c1 term1 term e1 term2 c2 term e2 term e3 c3 term e4 (same idea as BM25-F on fields [10]) CLEF 2008

Example: Matrix notation: let , then redefine each document as Query Clauses Model Bag of words: Query clauses : (bag of bags of weighted words): CLEF 2008

Query Clauses Implementation of W1 and W2 In general projection is query-dependent and needs to be done online: clause term frequency: clause collection frequency: clause document likelihood: clause collection lihelihood: CLEF 2008

Query Clauses Implementation of W1 and W2 IDF is not straight-forward, there are several possibilities: Some possibilities: • min, max, avg (leads to inconsistent situations for small weights) • expected clause idf: CLEF 2008

How can we construct the clauses? • Idea: use WordNet to expand each term in the query as a clause.  • Idea: use statistical methods to expand each term in the query.  • Idea: use query expansion to find terms, use statistical methods to group the, into clauses.  • Idea: use query expansion to find terms, use WordNet to group them into clauses.    • There exist several semantic similarity measures based on WordNet [11]: WN(s1,s2) • We construct a clause for every original query term, and we add to it expanded terms with: WN(s1,s2) < k • To be conservative, all terms not in an original clause are added together to a new “Other” clause. CLEF 2008

Results in CLEF 2008 Robust-WSD Task: • Implementation: DFR Expansion: 40 new terms extracted for each query. Query Clauses: Ranking: BM25 with standard params, on clauses: WordNet Similarity DFR Query Clauses CLEF 2008

Results in CLEF 2008 Robust-WSD Task: clauses 4% rel. impr. • 2nd team in CLEF Robust, 1st team well ahead without use of WSD. (overall results) CLEF 2008

Biblio [10] H. Zaragoza, N. Craswell, M. Taylor, S. Saria, and S. Robertson. Microsoft Cambridge at TREC 13: Web and hard tracks. In Text REtrieval Conference (TREC-13), 2004. [11]Z. Wu and M. Palmer, Verb semantics and lexical selection, 32nd. Annual Meeting of the Association for Computational Linguistics, ACL 1991. CLEF 2008

Uncovering Semantics through Structured Queries in Information Retrieval

Uncovering Semantics through Structured Queries in Information Retrieval

Presentation Transcript

Structured Queries for Legal Search

Structured Annotations of Web Queries

Exploiting Structured Ontology to Organize Scattered Online Opinions

Creating Complex Queries with Nested queries

Exploiting Route Redundancy via Structured Peer to Peer Overlays

Efficient processing of XPath queries with structured overlay networks

Semantics with Applications

Exploiting Routing Redundancy via Structured Peer-to-Peer Overlays

Effectively Exploiting Big Data with Semantics

RDB2RDF: Incorporating Domain Semantics in Structured Data

Database Queries and Structured Query Language (SQL)

YAGO-QA Answering Questions by Structured Knowledge Queries

Exploiting Large Scale Web Semantics

Exploiting Wikipedia Inlinks for Linking Entities in Queries

RDB2RDF: Incorporating Domain Semantics in Structured Data

Exploiting Preference Queries for Searching Learning Resources

Getting started with queries

Exploiting ebXML Registry Semantics in the eHealth Domain *

Exploiting Route Redundancy via Structured Peer to Peer Overlays

ARIES Algorithm for Recovery and Isolation Exploiting Semantics