1 / 29

Online Database vs. Web Search Engines

Online Database vs. Web Search Engines. 571-Information Access and Retrieval. Online Database. Overview of Online Database 30 years (William (2006). From 1975 to 2005, databases increased considerably, from 301 to 17539

lynsey
Download Presentation

Online Database vs. Web Search Engines

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Online Database vs. Web Search Engines 571-Information Access and Retrieval

  2. Online Database

  3. Overview of Online Database 30 years (William (2006) • From 1975 to 2005, databases increased considerably, from 301 to 17539 • database records from 52 million to 21.02 billion, and database entries from 301 to 16532. • The number of producers has not grown as fast as databases because one producer might publish multiple databases. • The number of publishers increased from 200 to 3208 from 1975 to 2005. • In 2005, the average producer produced 5.13 databases. Since each vendor might provide services from multiple databases, the number of vendors grew at a slower pace from 105 to 2811.

  4. Types of search • Known item search • Specific-information search • Subject search • Exploring/Browsing information • Others

  5. General search steps • Search plan • System access • Database selection (Optional) • Search query formulation • Preliminary results evaluation • Search query reformulation (Optional) • Final results evaluation (Optional)

  6. Some search Strategies • Building blocks • combine sub-searches • Citation pearl growing • use the index term to retrieve further similar citations • Successive fractions • reduce the set using narrower index terms • Most specific facet first • start with the most specific concept

  7. Search Strategy Formulation • Imagine the title and keywords of relevant documents • Boolean • and, or, not • proximity operator • adj, near, freq, atleast • search fields/segments • au, co, ti, de • Use controlled vocabulary to identify context • truncation • string • plural • single character

  8. How to find related Words? • Personal knowledge • terminology • relevant document • Term mapping provided by system • Feedback from search results • title, descriptor, text • Others

  9. Search Strategy Reformulation • System • search fields • vocabulary • more like this • refine search • Limit/focus search • User • relevance feedback

  10. Narrow search • Find the right database • Add another word or phrase • Negative feedback (exclude one aspect of the search statement) • Exclude related terminology • Restrict to certain field • title, descriptor, frequency, etc. • Restrict to certain types of publication • Restrict to certain time range • Restrict to certain language

  11. Evaluate search results • Known item • title, author, publication, date • Specific information • Key Word In Context (KWIC) • Subject information • title, abstract, descriptor, full text

  12. Check for Tutorial for online databases • http://www.uwm.edu/Libraries/ris/courses/sois510/ • http://training.dialog.com/onlinecourses/recorded/ • http://www.sois.uwm.edu/DE_Info/cahansen/WT3/WT3.html

  13. Web Search Engines

  14. Characteristics of web IR • Web documents • Distributed stored • Growing in size • Deep and surface documents • Multiple formats • Various in quality • Frequently changed • Others • Users • Various user groups • Others • Systems

  15. What is search engines? Users Search Engine Internet

  16. Key components • Data collection • Web spider or crawler • Data processing • Ranking • Indexing • Query formulating • Interface • Matching • Result displaying

  17. How ranking works? • Literally match • Measure of word significance: The frequency of word occurrence (term frequency) • location: relative position of a word • Examples • http://www.searchenginewatch.com/webmasters/work.html • http://www.searchenginewatch.com/webmasters/rank.html

  18. How ranking works? (Cont’) • Hyperlinks (Brin&Page 1998) • PR(A)=(1-d) + d(PR(T1)/C(T1) +…+PR(Tn)/C(Tn)) * • PA(A)—Page Rank of document A • C(A)—Number of outgoing links from document A • d—Dumping factor between 0-0.85 * http://infolab.stanford.edu/~backrub/google.html

  19. Other Types of Search Engines • Directories • hierarchically organized indexes that allow you to browse through lists of web sites by category or subject • Meta-search engines • query multiple search engines simultaneously and return a complete set of hits • Specialized search engines • Create a database of sites on a specific topic using robots or spiders • For specific user groups • Visualization

  20. Examples of Directories • Yahoo Directory http://dir.yahoo.com/ • The Internet Public Library http://www.ipl.org/ • Librarians’ Index to the Internet http://sunsite.berkeley.edu/InternetIndex • INFOMINE, from the University of California, is a good example of an academic subject directory

  21. Examples of Meta-Search Engines • MetaCrawler www.metacrawler.com • Ixquick http://ixquick.com/ • Clusty http://clusty.com/ • Mamma www.mamma.com

  22. More examples of Specialized Search Engines • Career Mosaic www.careermosaic.com • Diseases, Disorders and related topics www.mic.ki.se/Diseases/index.html • The Day in History www.historychannel.com/today • Shareware.com www.shareware.com

  23. User Behaviors • Web queries are short, not much modified, very simple in structure • Very few advanced search features, if do so, half of them are mistakes • View only first one or two pages • No interested in relevance feedback

  24. User search patterns in different environments (Jansen &Pooch, 2001)

  25. Appendix A: Tips • Most search engines employ the principles of Boolean logic in the formulation of search queries. If you take the time to understand the basics of Boolean logic, you will have a better chance of search success. • Search engines tend to have a default Boolean logic. This means that the space between multiple search terms defaults to either OR logic or AND logic. This has become a de facto standard. It is imperative that you know which logical operator is the default. Nowadays, the default logic tends to be AND, but you should always check the site's Help file to make sure. • Another de facto standard is the requirement to search for phrases within quotations, e.g., "dealth penalty".

  26. Appendix A (Cont’) • If the option is available, use proximity operators (e.g., NEAR) if these are available rather than specifying an AND relationship between your keywords. This will make sure that your search terms are located near each other in the full text document. The closer your terms are placed, the more possibly relevant the document will be. Google does proximity searching by default. • Field searching is another extremely important way of limiting your search results in large search engines that contain millions of full-text files. For example, TITLE:slavery in a search engine such as AltaVista will bring you more relevant hits than merely searching on the keyword slavery. • To enhance subject searches, try the URL field to narrow your results. The URL field offers a good way to search for certain subject terms. This is because of the make-up of the URL.

  27. Appendix A (Cont’) • The Internet is a self-publishing medium. It is not a library of evaluated publications selected by professionals. Rather, the Internet is a bulletin board containing everything from the definitive to the spurious. Everything, everything must be analyzed for its appropriateness for research use. • Before you select a search tool, always think about your topic and what you are trying to find. Once you begin your research, be sure to try out a handful of sites. Don't rely on a single site. • Don't just Google everything! Google is great, but there are other useful tools on the Web, too. Google has become so popular that many people use this tool exclusively, and miss out on others that might be more useful for their particular search. • Others?

  28. Appendix B Anatomy of a URL This is a URL on the CNN home page:     http://www.cnn.com/feedback/comments.html This URL is typical of addresses hosted in domains in the United States: Protocol: http Host computer name: www Second-level domain name: cnn Top-level domain name: com Directory name: feedback File name: comments.html The directory name and file name often contain subject terms. These can be searched with the URL field. For example, URL:slavery will give you more relevant results than the keyword slavery by searching for this term as a directory name or a file name.

  29. Appendix C • Search engine comparison chart • http://www.infopeople.org/search/chart.html • http://www.searchengineshowdown.com/features/ • Tutorials • Google Tutorial

More Related