Search Engine Guided By Dr. A. J. Agrawal By Chetan R. Rathod
Information Retrieval “Information retrieval is a field concerned with the structure, analysis, organization, storage, searching, and retrieval of information.”(Salton, 1968) information retrieval is the activity of obtaining Information resources relevant to an information need from a collection of information resources. Searches can be based on metadata or on full-text (or other content-based) indexing.
IR and Search Engine A search engine is the practical application of information retrieval techniques to large scale text collections Web search engines are best‐known examples, but many others -Open source search engines are important for research and development
Search Engine Issues Performance • Measuring and improving the efficiency of search. • Indexes are data structures designed to improve search efficiency. Dynamic data • The “collection” for most real applications is constantly changing in terms of updates, additions, deletions. • Acquiring or “crawling” the documents is a major task • Updating the indexes while processing queries is also a design issue
Continue…. Scalability Making everything work with millions of users every day, and many terabytes of documents Distributed processing is essential Adaptability Changing and tuning search engine components such as ranking algorithm, indexing strategy, interface for different applications
Search Engine Architecture A software architecture consists of software components, the interfaces provided by those components, and the relationships between them Architecture of a search engine determined by 2 requirements 1. effectiveness (quality of results) 2. efficiency (response time and throughput)
Document Data Store A simple database to manage large numbers of documents and structured data. Document components are typically stored in a compressed form for efficiency. Structured data consists of document metadata and other information extracted from the documents such as links and anchor text.
Conclusion The goals of search engine i.e. Efficiency and Effectiveness are fulfilled by indexing and Query processing.
Future Scope One direction for future research is the further improvement of the algorithm of indexing and Ranking. It is expected that the search engine selection index can be optimized by adjusting the weights of the terms contained in the search keywords according to the engines that the user selected. Another way to improve the accuracy is to rank topic-specific search engines by using collocations.
References W. BRUCE CROFT , DONALD METZLER , TREVOR STROHMAN “Search Engines Information Retrieval in Practice” AllSearchEngines.com, http://www.allsearchengines.com Information Retrieval, http://en.wikipedia.org/wiki/Information_retrieval Google!, http://www.google.co.in