1 / 16

Fumbles in the dark: measuring the Invisible Web

Fumbles in the dark: measuring the Invisible Web. Colin Reddy & Paul Wouters Networked Research and Digital Information – Nerdi NIWI-KNAW - www.nerdi.knaw.nl. What is the Invisible Web Why is it there How does it affect Web Indicators. Initial Questions. Dynamic nature of the Web

lucia
Download Presentation

Fumbles in the dark: measuring the Invisible Web

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fumbles in the dark: measuring the Invisible Web Colin Reddy & Paul Wouters Networked Research and Digital Information – Nerdi NIWI-KNAW - www.nerdi.knaw.nl

  2. What is the Invisible Web Why is it there How does it affect Web Indicators Initial Questions

  3. Dynamic nature of the Web Establishing how Search engine policies affect material they find Invisibility not simply related to file type (Some) Initial Problems

  4. Information can be called invisible in a certain search context (of a specific search technology) if: that information is not part of the results of the search, and that information does meet the criteria of relevance as formulated in the search, and that information would in principle be retrievable if an observer knew the exact location on the Web. Definition of Invisibility

  5. What characteristics of information on the Web might increase or decrease the probability that it will be invisible to a specified set of search engines at a particular point in time? New Question

  6. the number of in-links to the page containing the information the depth at which the information is located within a subdomain the file extension and the MIME type of the file containing the information the metatags with which the Web page is marked Factors involved

  7. the updating frequency of the Web site or page the accessibility of the information the format of the URL at which the information is located, and lastly the total of these “visibility characteristics” of the in-linking pages. More factors

  8. The experiment Aims • To quantify the amount of information residing on websites that is ‘invisible’ • Establish the characteristics of the information that determine the probability of it being ‘invisible’

  9. The experiment Method • establish the contents of a website independently of a search engine, and then compare this “control map” with the results returned from search engines.

  10. The experiment • Microsoft Search Analyst used to map the entire contents of a Web site. • At the same moment in time (important to minimise the effect of changes in website content over time), search engines and indexes were used to provide the same information.

  11. Services used Search enginesIndexes Google Yahoo Fast Open Directory AltaVista Look Smart Inktomi Galaxy

  12. Sample • Web sites of 99 Plant genetics institutes in the European research Area • Science Citation Index used to find articles, then list of institutes compiled • Search engines used to find URLs

More Related