1 / 18

Lies, Damn lies and Web Statistics

Lies, Damn lies and Web Statistics. IWMW 2005: Who’s web is it anyway?. Dr. Mike Lowndes, Interactive Media Manager, Natural History Museum, London Houses 350-permanent scientific staff, plus postgraduate students; one of the largest UK research institutes in the natural sciences.

finnea
Download Presentation

Lies, Damn lies and Web Statistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lies, Damn lies and Web Statistics IWMW 2005: Who’s web is it anyway? Dr. Mike Lowndes, Interactive Media Manager, Natural History Museum, London Houses 350-permanent scientific staff, plus postgraduate students; one of the largest UK research institutes in the natural sciences. (Right-click or click-hold (Mac) and press k or select Speaker Notes)

  2. Contents • Why bother? • Issues with web logs • Issues with analytic tools • Browser tracking • Comparison between approaches • Known issues with browser tracking • Nedstat input and findings from Newcastle University

  3. Why bother? • Web log analysis is currently the main method used to quantify web site usage for reporting. • Results are used by the government as performance indicators for institutional websites. • Not accurate or meaningful most of the time • no good for absolute measurement of usage. Can be used for: • Trend analysis • Content preferences • ROI estimation • Checking and fixing your site • Understanding users behaviour • Testing assumed pathways

  4. Issues with server logs • Dynamic IP • Many users using the same IP number over time. • Same user assigned many IP numbers over time. • Proxies • Several or many users behind 1 IP number • Caches (can be ‘in’ Proxies) • Commonly requested files cached closer to the users. • Can form the top 20-50 hosts accessing sites. • Robots and spiders • Few visits but lots of hits. • Analytic packages cannot keep up to date with all of them for exclusion. • Syndication • RSS feeds generate huge logs, but are not ‘read’ by humans initially. • Click-through configuration. • Reporting by analysis tools • Often weekly or monthly reports: realtime is very labour/server intensive • Reports often complex and techy.

  5. Issues with log analysis tools • Webtrends vs Summary.net • 1. Natural History Museum • Summary SP (summary.net) Version 4.2.1, unregistered demo, default configuration • 2. UKOLN (Bath) • WebTrends (www.webtrends.com) Version 5, default configuration • Both tools were applied to the same log file • Default configurations – not removing robots • Note: WebTrends documentation not clear on this point

  6. Summary SP Webtrends 7 Connections (hits) - +0.67% hits Page views (page hits) - +5.00% Visits (user sessions) - +0.07% Failed hits - +0.30% Average visit duration - -30.0% (+250%) Browsers IE 75% 86% Netscape compatible 2% 4% Referrers Top Level Domains US US UK UK AUS CAN NETHER NETHER CAN AUS JAP JAP Measurement discrepancies

  7. Comparison between tools • Not a single measurement was identical. • Most measurements were within 5% • Visit duration measurement widely different, and can depend on configuration. Possible bug in WebTrends version 5. • Page view measurements were quite different. Results broadly similar but direct comparisons, especially of Page Views, are not really justified.

  8. Browser tracking • Do they have fewer inaccuracies and distortions? • Is it easier on the web team? • Is it affordable? • Does it give us more information / better information?

  9. Browser tracking • Requires code to be added to pages • Uses an image, sourced from the tracking website. Also uses javascript and cookies for gathering extended and repeat-visit information • Usually hosted services • Provide near real-time tracking • Few of the issues distorting logs affect these measurements (according to the blurb) • Main players: Nedstat, Nielson/Netratings, WebSideStory

  10. Comparison between tools • Summary SP VS Nielson/Netratings • Run on one section of a site over a month. • ‘Visiting’ section of the Natural History Museum site – small but popular and easily tagged.

  11. Results 1 – visits and visitors

  12. Results 2 – pages viewed

  13. Results 3 – country • Depends on the quality of the geographical IP database, not the mode of tracking?

  14. Conclusions regarding traditional Log analysis Assuming browser tracking is more accurate… • We have fewer visit sessions than we thought, but more visitors • Fewer visits (sessions), possibly due to robot exclusion • More visitors (unique users), possibly due to the masking effect of proxies/caches and browser caches • Visit duration is much shorter than thought • possibly due to robots/spiders and cache updating. • Country information is roughly accurate so long as a geographical lookup is used. • Activity of popular pages, which are often cached, will be underestimated

  15. Browser tracking advantages • Almost real-time analysis, incremental data. • Better repeat user tracking and individual pathway analysis. • Configurable, graphical reports for non-techies • Techie still needs to configure those reports however, as an understanding of web analytics is required • Cut our monthly staff time down from 1.5 days to 1 hour • Appear to be more accurate in describing the activity of real people, but we would like to see some independent research.

  16. Issues with browser tracking • Setup is not trivial: You need to add code to every page. • Multiple server / ownership issues. • Does not always work (or get full user details) if Javascript is turned off or cookies disallowed. • Does not work with text-only browsers. • Unknown compatibility with PDAs, mobiles etc. Questions: • Would we get different results with different hosted services? • ABCE: industry standards for measurement • Cookies often deleted unless user is confident in the source? • This would affect the measurement of repeat visitors and behaviour Political issues: • Issues with external hosting of institutional data • Security of personal data issues with external hosting • E.g. measurements of student and staff use of a VLE.

  17. Next steps • Many private sector and public sector sites have already moved to browser tracking. • About 6 National Museums are currently discussing hosted browser tracking. • 5 Universities currently involved in a trial of NedStat.

  18. Thank you

More Related