1 / 9

MetaSearch

MetaSearch. R. Manmatha, Center for Intelligent Information Retrieval, Computer Science Department, University of Massachusetts, Amherst. Introduction. MetaSearch / Distributed Retrieval Well defined problem Language Models are a good way to solve these problems. Grand Challenge

sharvani
Download Presentation

MetaSearch

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MetaSearch R. Manmatha, Center for Intelligent Information Retrieval, Computer Science Department, University of Massachusetts, Amherst.

  2. Introduction • MetaSearch / Distributed Retrieval • Well defined problem • Language Models are a good way to solve these problems. • Grand Challenge • Massively Distributed Multi-lingual Retrieval

  3. MetaSearch • Combine results from different search engines. • Single Database – Or Highly Overlapped Databases. • Example, Web. • Multiple Databases or Multi-lingual databases. • Challenges • Incompatible scores even if the same search engine is used for different databases. • Collection Differences, and engine differences. • Document Scores depend on query. Combination on a per query basis makes training difficult. • Current Solutions involve learning how to map scores between different systems. • Alternative approach involves aggregating ranks.

  4. Current Solutions for MetaSearch – Single Database Case • Solutions • Reasonable solutions involving mapping scores either by simple normalization, equalizing score distributions, training • Rank Based methods – eg Borda counts, Markov Chains.. • Mapped scores are usually combined using linear weighting. • Performance improvement about 5 to 10%. • Search engines need to be similar in performance • May explain why simple normalization schemes work. • Other Approaches • A Markov Chain approach has been tried. However, results on standard datasets are not available for comparison. • Shouldn’t be difficult to try more standard LM approaches.

  5. Challenges – MetaSearch for Single Databases • Can one combine search engines which differ a lot in performance effectively? • Improve performance even using poorly performing engines? How? • Or use resource selection like approach case to eliminate poorly performing engines on a per query basis. • Techniques from other fields. • Techniques in economics and social sciences for voter aggregation may be useful (Borda count, Condorcet ..) • LM approaches • Will possibly improve performance by characterizing the scores at a finer granularity than say score distributions.

  6. Multiple Databases • Two main factors determine variation in document scores • Search engine scoring functions. • Collection variations which essentially change the IDF. • Effective score normalization requires • Disregarding databases which are unlikely to have the answer • Resource Selection. • Normalizing out collection variations on a per query basis. • Mostly ad hoc normalizing functions. • Language Models. • Resource Descriptions already provide language models for collections. • Could use these to factor out collection variations. • Tricky to do this for different search engines.

  7. Multi-lingual Databases • Normalizing scores across multiple databases. • Difficult Problem • Possibility: • Create language models for each database. • Use simple translation models to map across databases. • Use this to normalize scores. • Difficult.

  8. Distributed Web Search • Distribute web search over multiple sites/servers. • Localized/ Regional. • Domain dependent. • Possibly no central coordination. • Server Selection/ Database Selection with/without explicit queries. • Research Issues • Partial representations of the world. • Trust, Reliability. • Peer to peer.

  9. Challenges • Formal Methods for Resource Descriptions, Ranking, Combination • Example. Language Modeling • Beyond collections as big documents • Multi-lingual retrieval • Combining the outputs of systems searching databases in many languages. • Peer to Peer Systems • Beyond broadcasting simple keyword searches. • Non-centralized • Networking considerations e.g. availability, latency, transfer time. • Distributed Web Search • Data, Web Data.

More Related