1 / 21

Web Analytics

Web Analytics. Xuejiao Liu INF 385F: WIRED Fall 2004. Outline. Introduction What is Web Analytics Why Web Analytics matter Secondary readings Log files analysis Web usage mining Data preparation KDD process Document access in repositories . Log File Lowdown (Michael Calore, 2001 ).

danno
Download Presentation

Web Analytics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Web Analytics Xuejiao Liu INF 385F: WIRED Fall 2004

  2. Outline • Introduction • What is Web Analytics • Why Web Analytics matter • Secondary readings • Log files analysis • Web usage mining • Data preparation • KDD process • Document access in repositories

  3. Log File Lowdown(Michael Calore, 2001 ) • Log file • What are in log file • Traffic • Audience • Browsers/Platforms • Errors • Referers

  4. Log File Lowdown • Sample Log File adsl-63-183-164.ilm.bellsouth.net - - [09/May/2001:13:42:07 -0700] "GET /about.htm HTTP/1.1" 200 3741 “http://www.e-angelica.com“ "Mozilla/4.0 (compatible; MSIE 5.0; Windows 98)" • Log File Analyzers • WebTrends, Sawmill, Analog, Webalizer, HTTP-analyze

  5. WebTrends • log file analyzer • Advantages • Fast and effective • User-friendly interface • Feature-rich • Support different operating systems • Disadvantages • Not free

  6. WebTrends

  7. The KDD Process for Extracting Useful Knowledge from Volumes of Data (Fayyad, U., G. Piatetsky-Shapiro, et al. 1996) • KDD: Knowledge Discovery in Databases • The value of data • Definitions • KDD • Data mining

  8. The KDD Process • The KDD process • 1.Creating a target dataset • 2.Preprocessing and data cleaning • 3.Data reduction and projection • 4.Data mining • Choosing the data mining function • Choosing the data mining algorithm • 5.Interpretation and evaluation

  9. The KDD Process • Data Mining • Data mining involves fitting models to or determining patterns from observed data • Data mining algorithms • The model • The preference criterion • The search algorithm

  10. The KDD Process • Data Mining • Model functions Classification Regression Clustering Dependency modeling Link anlysis • Goals of Data Mining Predictive and descriptive

  11. Data Preparation for Mining World Wide Web Browsing Patterns(Cooley, R. W., B. Mobasher, et al. 1999) • Web Usage Mining vs. data mining • The WEBMINER process • Preprocessing • Mining algorithms • Pattern Analysis

  12. Data Preparation • Preprocessing • Data cleaning • User identification • Session identification • Path completion • Formatting

  13. Data Preparation

  14. Data Preparation

  15. Tracking the Growth of a Site( Nielsen, Jakob, 1998) • Exponential growth of the web and the internet • Statistical method • Logarithmic convert to get linear regression Statistical analysis • Hypothesis: the site is growing (number of pageviews and date are correlated) • R2 and significance

  16. Tracking the Growth of a Site R2 = 0.96, p<0.001

  17. Tracking the Growth of a Site • Predict growth rate • Clean noise • Confident interval

  18. Predicting Document Access in Large, Multimedia Repositories(by Recker, M. R. and J. E. Pitkow, 1996) • patterns of document requests in network-accessible multimedia databases • Main idea • Two related domains: Human memory and libraries • Borrow models and research results from them

  19. Predicting Document Access • The model – human memory (Anderson and Schooler) • The relationship of recency and performance is a power function • The relationship of frequency and performance is a power function • Tow parameters for performance • Need probability p and Need odds p/(1-p) • The linear function: • Log(Need odds) = a Log(Frequency) + b

  20. Predicting Document Access • Apply Human Memory Analysis in Document Requests Model • Dataset: log file of Georgia Tech WWW repository • A dynamic information ecology • Frequency analysis • Regression equation: • Log(Need Odds) = .99 Log (Frequency) – 1.30 • Recency analysis • Regression equation: • Log(Need Odds) = -1.15 Log(days) + .41 • Combining recency and frequency

  21. Predicting Document Access • Conclusion • Recency and frequency of past document access are strong predictors of future document access • Recency probed to be a stronger predictor than frequency • Applications for the design of information systems • Determine optimal ordering of retrieved items • Inform design decisions • Design of caching algorithms

More Related