1 / 17

TOPIC CENTRIC QUERY ROUTING

TOPIC CENTRIC QUERY ROUTING. Research Methods (CS689) 11/21/00. By Anupam Khanal. Introduction: What is query routing? Searching online can be both rewarding and frustrating. General search engines such as Yahoo, Lycos return many irrelevant information to users query.

kyrene
Download Presentation

TOPIC CENTRIC QUERY ROUTING

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. TOPIC CENTRIC QUERY ROUTING Research Methods (CS689) 11/21/00 By Anupam Khanal

  2. Introduction: • What is query routing? • Searching online can be both rewarding and frustrating. • General search engines such as Yahoo, Lycos return many irrelevant information to users query. • In such context, query routing attempts dynamically route each users’ query to the appropriate specialized search.

  3. Problem Description • There are many general search engines such as Yahoo, Lycos, Alta Vista etc. • There are also many topic specific search engines such as VaccationSpot.com, KidsHealth.com etc. • However, many casual users are not familiar with all these topic specific search engines. • In such context, topic centric query expansion is important.

  4. Why Topic Centric Query Routing: • It is of utmost importance to analyze other query routing systems as well before we discuss the importance of Topic Centric routing. • Manual Query Routing Services: - provide the categorized list of specialized search engine - users have to choose the search engines - although keyword search interface is provided the terms that can be accepted as the keywords are limited. • Query Routing based on Centroids: - consist centroids which are summaries of databases - these summaries consists a complete list of terms and frequencies of the databases.

  5. - search engine is located by dividing which databases are relevant to a user query by comparing the query with each centroid. • - this technique cannot be applied to most of the topic specific search engines provided on the Web because of the restricted access to their internal database. • Query Routing Without Centroids: • - Instead of centroids this systems generate a short text to explain the centroids of databases. • - if the search keywords are contained is such text then only the search engine will be located.

  6. Research Objective: • In such context, Topic Centric Query Routing is • appropriate as it uses the routing model to expand • the query. • The general framework of the query routing model is as • follows: • Getting relevant terms from the Web: • - routing model does not use any special dictionaries, but it uses the Web as the source of relevant terms. • - finds the Web documents relevant to the user query • dynamically by submitting that query to a general search engine. • - the relevant terms are extracted from those documents.

  7. Co-occurrence based evaluation of term relevance. • - the mutual relevance of terms is evaluated on the basis • of their co-occurrences in the documents. • - the co-occurrences of the search keywords are counted • in all the documents retrieved by the general search engine. • - the routing model list all the distinct terms contained in • all documents and counts for each term the number of • documents that contain both the search key word and that • term. • Using a pseudo-feedback technique • - it is difficult to determine the term relevance from only the • results of a single document search on the general search • engine.

  8. -even relevant terms often have few co-occurrences in • the selected documents of the first search. • in such context, query routing model re-evaluates • such low co-occurrences terms selecting terms to be • re-evaluated from the first search results, formulating • new queries by adding the selected terms to the original • query and performing the co-occurrence based evaluation • for each formulated query.

  9. Query expansion procedure • Get a document set D0 relevant to a user query Q0, where search keywords are w01, ..., w0n, by sending Q0 to a general search engine. • Count co-occurrences of search keywords and other terms in the document set D0. • Let WH0 and WL0 be a set of terms whose co-occurrences exceed a certain threshold and a set of the other terms, respectively. WH0 is considered relevant to the query Q0and will be a part of the query expansion result. • Pick up at most four topic terms wt1-wt4 from WL0. • Formulate four queries QT1-QT4 by combining wt1-wt4 with Q0 (for example, QT1="w01 ... w0n wt1"). Figure 4: Query expansion procedure.

  10. 6. Clustering all terms in D0 to at most three clusters: W1={w11, ..., w1m}, W2={w21, ..., w2k} and W3={w31, ..., w3j}. • 7. Formulate three queries Q1-Q3 by combining W1-W3 with Q0 (for example, Q1="w01 ... w0nw11 ... w1m"). • 8. Get document sets DT1-DT4 and D1-D3 by sending QT1-QT4 and Q1-Q3 independently to a general search engine. • 9. Count co-occurrences in DT1-DT4 and D1-D3. Sets of high co-occurrence terms WTH1-WTH4 and WH1-WH3, as well as WH0 in step 3, are query expansion results.

  11. Query Routing Result Query: “python” User query • If you are looking for information about… • movie-monty python Phrase to explain topic [1600] Search/Go toSearch the Internet Movie Database [1600] Search/Go to The Roger Ebert Movie Files [1600] Search/Go to Horror Search Recommended topic Search Engines

  12. Other Topics…. Object oriented programming in python [7500] Search/Go toIndex to Object Oriented Information Sources [3600] Search/Go toUnix Programming jpython- python in java [6300] Search/Go tojava.sun.com –The Source for Java™ Technology [5641] Search/Go toGamelan- The official Java Directory [4921] Search/Go toJCentral – Search the web for Java [4266] Search/Go toIndex to Object Oriented Information Sources

  13. Importance ofTopic Centric Query Routing • Query Routing Model is used. • Query Routing model doesn’t generate centroids. • IT consists an off line pre-processing component and online interface. • Offline Query Routing Model takes as input a set of search engines and creates for each engine, an approximate textual model of that engine’s content or scope.

  14. Online Query Routing Model takes a user query as input and applies a novel query expansion technique to the query; • Then it clusters the output of the query expansion to suggest multiple topics that user may be interested in. • Each topic is associated with a set of search engines, eg., “Python”

  15. Query Expansion model has the ability to automatically obtain terms relevant to a query from the web. • Using Query Expansion model, it is not necessary to maintain a massive dictionary of terms in a wide range of fields.

  16. Conclusion • Topic centric query routing uses a query expansion model. • Query expansion model obtains all the information necessary in query routing form the web. • Thus Query routing model is an intelligent agent that uses the web as its knowledge and identifies topics of given queries dynamically by query expansion.

More Related