1 / 26

Eric Sieverts

Eric Sieverts. Institute for Media & Information Management (Hogeschool van Amsterdam). University Library Utrecht IT Department. Google and/or/not databases. why using search engines ? functionality of search engines (including the latest technology)

Jimmy
Download Presentation

Eric Sieverts

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Eric Sieverts Institute for Media & Information Management (Hogeschool van Amsterdam) University Library Utrecht IT Department

  2. Google and/or/not databases • why using search engines ? • functionality of search engines (including the latest technology) • what is hidden for search engines ? • search engines  databases • why would people prefer google ? • what is up for us, librarians ? Eric Sieverts | e.sieverts@library.uu.nl | http://www.library.uu.nl/medew/it/eric | Bielefeld 2002 Conference, 7 febr 2002

  3. why using search engines ? • easy to use best match technique • such a good relevance ranking (at least some of them) • still a lot of additional (hidden) functionality • recent language technological methods • such large collections Eric Sieverts | e.sieverts@library.uu.nl | http://www.library.uu.nl/medew/it/eric | Bielefeld 2002 Conference, 7 febr 2002

  4. why using search engines ? some common document ranking parameters • the more terms from your query in a document, the better (now for most engines only "all the terms") • the more prominent a term in a document, the better (in <title>, in the first few sentences, in a <meta> tag) • the more frequently repeated a search term, the better • the closer together the terms in a document, the better • the more uncommon a search term, the higher its weight • the more "popular" a web-page, the better (more hyperlinks pointing to it, more people visiting it, ..)  google’s strong point  Eric Sieverts | e.sieverts@library.uu.nl | http://www.library.uu.nl/medew/it/eric | Bielefeld 2002 Conference, 7 febr 2002

  5. why using search engines ? google offers a lot of additional functionality • boolean search (if you really want to - I do occasionally!) • "citation" search (other web-pages linking to "this" site) • similarity search (means here: similar linking patterns; not really better than word-based similarity search) • disappeared documents in result set can be retrieved from archive cache • many other document types than just plain html • also image search, usenet archives, integration of open directory subject tree see google see google advanced search Eric Sieverts | e.sieverts@library.uu.nl | http://www.library.uu.nl/medew/it/eric | Bielefeld 2002 Conference, 7 febr 2002

  6. why using search engines ? modern language technology aboard categorisation of result sets • (formerly) northernlight's custom search folders (rulebased method) • teoma (statistics based method) • wisenut (statistics based method) • fast-alltheweb (statistics based method) teoma wisenut Eric Sieverts | e.sieverts@library.uu.nl | http://www.library.uu.nl/medew/it/eric | Bielefeld 2002 Conference, 7 febr 2002

  7. why using search engines ? search engine “sizes” see for instance “search engine watch” december 2001 search engine watch Eric Sieverts | e.sieverts@library.uu.nl | http://www.library.uu.nl/medew/it/eric | Bielefeld 2002 Conference, 7 febr 2002

  8. what is hidden for (most) search engines ? (and consequently for their users ! ) • non-HTML documents: flash, office-files, pdf (not fundamentally impossible, as google demonstrates) • "real-time" data (too difficult to keep track) • dynamically, database generated pages(out of fear for spider traps; but google seems to do it) • all information hidden in searchable databases(spiders cannot fill out database search forms) • to-be-paid-for or licensed information(bibliographic databases, full-text scientific journals, ....) • all information that is not (yet) on the web Eric Sieverts | e.sieverts@library.uu.nl | http://www.library.uu.nl/medew/it/eric | Bielefeld 2002 Conference, 7 febr 2002

  9. search engines vs. databases besides - for us obvious - differences in content:differences in functionality but do users use all of this ??despite its importance !! Eric Sieverts | e.sieverts@library.uu.nl | http://www.library.uu.nl/medew/it/eric | Bielefeld 2002 Conference, 7 febr 2002

  10. why do students graduate on google" ? why do so many users prefer the use of search engines ? • apparent simplicity of search engine interface • too many separate other search systems to address • overwhelming choice of databases example • overwhelming choice of digital primary sources example • plethora of different database system interfaces • interfaces crowded with "functionality" what would you use ? • if you did't know what's the difference • if you did't know what you'd miss Eric Sieverts | e.sieverts@library.uu.nl | http://www.library.uu.nl/medew/it/eric | Bielefeld 2002 Conference, 7 febr 2002

  11. do you miss so much with only google ? • google also indexes .PDF , .DOC , .PPT , .XLS , .RTF • the web also contains preprints, reports, projects etc. that are NOT in databases • many scientists (and others) put copies of their published articles on their personal websites that seems fine, but you still get low recall, because: • the web remains a very fragmented incomplete mess (behind that simple google screen) • it is not indexed consistently and in a controlled way but for many users lousy recall is no problem at all ..... Eric Sieverts | e.sieverts@library.uu.nl | http://www.library.uu.nl/medew/it/eric | Bielefeld 2002 Conference, 7 febr 2002

  12. what is up for libraries ? • realise better integrated access to all our precious (and expensive) information sources • realise more advanced retrieval possibilities while keeping the advances of controlled indexing as well Eric Sieverts | e.sieverts@library.uu.nl | http://www.library.uu.nl/medew/it/eric | Bielefeld 2002 Conference, 7 febr 2002

  13. integrated system: local central index solution search central index indexer indexing- rules for targets internet full-text links document text files document text files

  14. integrated system: metasearch / portal solution search query-generator / result-collector configuration data for targets Z39.50 http Z39.50 internal api internet Z39.50 http http xml Z39.50 search search search search search search index index index index index index files files files files files files

  15. and some look into the (near) future .... competition between “ “ and "our databases" will continue Eric Sieverts | e.sieverts@library.uu.nl | http://www.library.uu.nl/medew/it/eric | Bielefeld 2002 Conference, 7 febr 2002

More Related