1 / 8

Guide to the Click s tream Data

Guide to the Click s tream Data. Petr Berka University of Economics, Prague berka@vse.cz. Web U sage M ining Domain. click-stream - a sequential series of page view (displays on user’s browser at one time) requests ,

jemma
Download Presentation

Guide to the Click s tream Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Guide to the Clickstream Data Petr Berka Universityof Economics, Prague berka@vse.cz

  2. Web Usage Mining Domain • click-stream- a sequential series of page view (displays on user’s browser at one time) requests, • server session -a click-stream of page views for a single user for a particular web site, • user session - is the click-stream of page views for a single user across the entire web. Clickstream Data, Discovery Challenge 2005

  3. The Clickstream Data • ~3Millions of records (24 days) from a www shop web server log • Contains information about time; IP address; session ID; page request; referer • There are hundreds of thousands of sessions; most of them very short, on average 16 pages • Each page request in this www shop has the same structure – page type / content ID (product ID) • Page types are for example dp (detail of product), sb (shopping basket), ct (contact) Clickstream Data, Discovery Challenge 2005

  4. Example of the Data unix time ;IP address ; session ID ; page request; referee 1074589200;193.179.144.2 ;1993441e8a0a4d7a4407ed9554b64ed1;/dp/?id=124 ;www.google.cz; 1074589201;194.213.35.234;3995b2c0599f1782e2b40582823b1c94;/dp/?id=182 ; 1074589202;194.138.39.56 ;2fd3213f2edaf82b27562d28a2a747aa;/ ;www.seznam.cz; 1074589233;193.179.144.2 ;1993441e8a0a4d7a4407ed9554b64ed1;/dp/?id=148 ;/dp/?id=124; 1074589245;193.179.144.2 ;1993441e8a0a4d7a4407ed9554b64ed1;/sb/ ;/dp/?id=148; 1074589248;194.138.39.56 ;2fd3213f2edaf82b27562d28a2a747aa;/contacts/ ; /; 1074589290;193.179.144.2 ;1993441e8a0a4d7a4407ed9554b64ed1;/sb/ ;/sb/; Clickstream Data, Discovery Challenge 2005

  5. Data Description • table “obchod” (shop) - name of the internet shop (7 entries), • table “kategorie” (category) - info about category of products (64 entries), • table “list” (sheet) - info about a specific product of a more detailed type (157 entries), • table “znacka” (brand) - name of the producer or brand of a product (197 entries), • table “tema” (theme) - info about themes discussed in the on-line advice (36 entries) Clickstream Data, Discovery Challenge 2005

  6. Data Summary (1/3) • 3 617 171 page requests • 522 410 sessions • 318 523 single page • 203 887 length > 1 • avg. length 16 • median 8 • modus 2 • longest 15454 Clickstream Data, Discovery Challenge 2005

  7. Data Summary (2/3) • time spent during a session • avg. time 00:24:46 • median 00:03:08 • modus 00:00:09 • longest 433:27:53 Clickstream Data, Discovery Challenge 2005

  8. Data Summary (3/3) distribution of sessions with length > 1 Clickstream Data, Discovery Challenge 2005

More Related