1 / 36

An OLAM Framework for Web Usage Mining and Business Intelligence Reporting

An OLAM Framework for Web Usage Mining and Business Intelligence Reporting. Xiaohua (Tony) Hu Drexel University Philadelphia, PA, 19104. Outline. Introduction Data Capture Data Webhouse Construction Mining, OLAP and Business Reporting Pattern Evaluation and Development Q &A.

davin
Download Presentation

An OLAM Framework for Web Usage Mining and Business Intelligence Reporting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An OLAM Framework for Web Usage Miningand Business Intelligence Reporting Xiaohua (Tony) Hu Drexel University Philadelphia, PA, 19104

  2. Outline • Introduction • Data Capture • Data Webhouse Construction • Mining, OLAP and Business Reporting • Pattern Evaluation and Development • Q &A

  3. Benefits of Web Usage Mining • Targeting customers based on usage behavior or profile (personalization) • Adjusting web content and structure dynamically based on page access pattern of users (adaptive web site) • Enhancing the service quality and delivery to the end user (cross-selling, up-selling) • Improving web server system performance based on the web traffic analysis • Identifying hot areas/killer areas of the web site

  4. Web Usage Mining Steps • Data capture (clickstream, sales, customers, products, promotion, shipping etc) • Data processing -- ETL from OLTP to DW • Pattern discovery and OLAP cubes and reports • Pattern evaluation and deployment

  5. Data Capture • The web server logs recording the visitors’ click stream behaviors (pages template, cookie, transfer log, time stamp, IP address, agent, referrer etc.) • Product information (product hierarchy, manufacturer, price, color, size etc.) • Content information of the web site (image, gif, video clip etc.) • The customer purchase data (quantity of the products, payment amount and method, shipping address etc.) • Customer demographics information (age, gender, income, education level, lifestyle etc.)

  6. Issues in Clickstream Capture • Distinguish sessions • Use Cookies to track customers • Tag templates • Log business events • Records query string • Crawlers detection

  7. Request (Click) Data: Template, Product, Assortment Time stamps for each click, Compile & execution times Query string information, Referring page information The request sequence number within a session Cookie Data: The cookie of the visitor (This ID is temporary if the user has cookies turned off) Session Data: Session length Browser (useragent) and IP address information for the client User’s Cookie ID User ID of the user if he/she logged in Whether or not the session timed out The total number of requests in the session Whether the session belongs to a user who “opts-out” The total number of sessions that have come from users with this Cookie ID What Kind of Clickstream Information Need to Be Recorded?

  8. Web Log Data • Designed for debugging purpose, not for analysis

  9. Crawler Session • Crawlers are programs that visit your site search engine, shopping bots • It is very important to filter the crawler session (some of our clients’ site, the crawler sessions account up to 30%)

  10. Techniques to Identify Crawlers Sessions • Build a model to identify crawler sessions: common turn off images, have empty referrers, friendly bots will visit robots.txt file, page hits rate is too fast, pattern is a depth-first or breadth-first search of the site, bots never purchase • Created invisible links in the web page

  11. OLTP vs DSS

  12. What is OLAM? • OLAP: (On-Line Analytical Processing) pre-calculate summary information to enable drilling, pivoting, slicing/dicing, filtering , to analyze business from multiple angles or views (dimensions) • OLAM (On Line Analytical Mining): An integration of data mining and data warehousing and OLAP technologies

  13. Data Webhouse Construction • Requirement Analysis of the Data Webhouse • Data Webhouse Schema Design Dimensions, Fact Tables, Aggregation/Summary tables

  14. Requirement Analysis of the Data Webhouse 1. Web site activity (hourly, daily, weekly, monthly, quarterly etc) 2. Product sale (by region, by brand, by domain, by browser type, by time etc) 3. Customers (by type, by age, by gender, by region, buyer vs. visitor, heavy buyer vs. light buyer etc) 4. Vendors (by type, by region, by price range etc) 5. Referrers (by domain, by sale amount, by visit numbers etc) 6. Navigational behavior pattern (top entry page, top exit page, killer age, hot page etc) 7. Click conversation-ratio 8. Shipments (by regular, by express mail etc) 9. Payments (by cash, by credit card, e-money etc)

  15. Data Webhouse Schema Design • Define the Source Data • Choose the Grain of the Fact Tables • Choose the Dimensions Appropriate for the Grain • Choose the Facts Appropriate for That Grain

  16. Appropriate Dimensions • Session Dimension • Page Dimension • Time Dimension • User Dimension • Product Dimension

  17. Session Attributes • Session Length • Referrer • Agent • Host Name • IP Address • Cookie_id • First Request Time • Last Request Time • Average Time Per Page • Purchase Flag • Time Out Flag • Many more …

  18. Customer Attributes • Address: City, State/Province, Country • Gender, Age, profession, Education, Marital Status • Contact Info: Email, Phone • Repeat Visit Flag • Frequent Buyer Flag • Heavy Spender Flag • Reader/Browser Flag • Many more …

  19. Page Attributes • Page Template • Page Location • Page Type • Page Category • Page Description • Registration Page Flag • Shipping Page Flag • Checkout Page lag • Many more …

  20. Promotion Attributes • Promotion Name • Price Reduction Percentage • Adv Type • Coupon Type • Begin Date • End Data • Promotion Region • Many more …

  21. Date Attributes • Day, Week, Month, Quarter, Year • Day number in Month, Day Number in Quarter, Day Number in Year • Week number in Month, Week Number in Quarter, Week Number in Year • Weekday Flag • Weekend Flag • Season • Many more …

  22. Time Attributes • Second, Minute, Minute, Hour, • Early Morning Flag • Late Afternoon Flag • Lunch Time Flag • Dinner Time Flag • Late Evening Flag • Many more …

  23. OLAP • View data from Multiple views and angles • Immediate response to business query • Ability to drill down and roll up the multiple dimensional data in the cube • Analyze Business measures such as profit, revenue, quantity from different angles, perspectives and various factors

  24. Some Fact Tables

  25. Some Dimension and Summary Tables in Webhouse

  26. Search Argument Findings

  27. Top 20 Paths Lead to Non-Purchased Sessions path counts main 14622 main->main 3731 main->main->main 790 main->main->login 329 main->main->main->main 303 login 274 main->main->pna->pna 216 pna 212 main->main->pna->pna->pna 192 main->main->eDealer 185 mc 180 main->main->pna 175 main->main->pna->pna->pna->pna->pna 169 main->main->pna->pna->pna->pna->pna->pna 166 main->main->pna->pna->pna->pna->pna->pna->pna 160 main->main->pna->pna->pna->pna 147 main->main->mc->mc->mc->mc 131 main->main->pna->pna->pna->pna->pna->pna->pna->pna 118 main->main->mc->mc->mc 111 main->main->pna->pna->pna->pna->pna->pna->pna->pna->pna 106

  28. Top 20 paths start at OF_Main.jsp and exit at OF_Main.jsp Paths Counts OF_Main.jsp->splash.jsp->OF_Main.jsp 154 OF_Main.jsp->OF_Main.jsp 122 OF_Main.jsp->splash.jsp->OF_Main.jsp->OF_Main.jsp 52 OF_Main.jsp->OF_Main.jsp->OF_Main.jsp 28 OF_Main.jsp->splash.jsp->OF_Main.jsp->OF_Main.jsp->OF_Main.jsp 25 OF_Main.jsp->OF_Main.jsp->splash.jsp->OF_Main.jsp 23 OF_Main.jsp->splash.jsp->pna/pa_main.jsp->OF_Main.jsp 16 OF_Main.jsp->splash.jsp->login/ln_login.jsp->OF_Main.jsp 15 OF_Main.jsp->OF_Main.jsp->OF_Main.jsp->OF_Main.jsp 13 OF_Main.jsp->splash.jsp->mc/MC_main.jsp->OF_Main.jsp 13 OF_Main.jsp->splash.jsp->dealer_positioning.jsp->OF_Main.jsp 11 OF_Main.jsp->splash.jsp->pna/pa_main.jsp->pna/pa_family.jsp->OF_Main.jsp 11 OF_Main.jsp->splash.jsp->login/ln_login.jsp->login/ln_loginopp.jsp->login/ln_message.jsp->OF_Main.jsp 10 OF_Main.jsp->splash.jsp->OF_Main.jsp->OF_Main.jsp->OF_Main.jsp->OF_Main.jsp 9 OF_Main.jsp->splash.jsp->cart/sc_listing.jsp->OF_Main.jsp 7 OF_Main.jsp->splash.jsp->login/ln_login.jsp->login/ln_login_step.jsp->OF_Main.jsp 7 OF_Main.jsp->browser_message.jsp->OF_Main.jsp 6 OF_Main.jsp->dealer_positioning.jsp->OF_Main.jsp 5 OF_Main.jsp->OF_Main.jsp->OF_Main.jsp->OF_Main.jsp->OF_Main.jsp 5 OF_Main.jsp->OF_Main.jsp->OF_Main.jsp->OF_Main.jsp->OF_Main.jsp->OF_Main.jsp 5

  29. Single/Multiple visitors/buyers

  30. Web Usage Mining Methods • Construct cubes from data webhouse roll-up, drill-down the OLAP cubes to find the top domain, top products, top hot spot, web activity, most frequently accessed time periods etc. • Perform data mining on data webhouse find association patterns for cross-sell and up-sell, build link between pages, sequential patterns, and trend of web accessing, improve system design by web caching, web page prefetching, and web page swapping

  31. Mining the web data • Association Rules • Classification/Prediction • Clustering

  32. Data Mining -Association • Path Link analysis : Explore, understand, predict browsing pattern • Shopping cart Analysis: cross-sell, up-sell to increase wallet-share

  33. Gloss Example Relations Lift Support(%) Confidence(%) Rule 1 2 1.56 1.89 18.58 Bloom ==> Dirty_Girl 2 2 1.56 1.89 15.91 Dirty_Girl ==> Bloom 3 2 1.13 1.50 11.52 Philosophy ==> Bloom 4 2 1.13 1.50 14.75 Bloom ==> Philosophy 5 2 1.66 1.41 11.87 Dirty_Girl ==> Blue_Q 6 2 1.66 1.41 19.75 Blue_Q ==> Dirty_Girl 7 2 3.12 1.32 18.41 Tony_And_Tina ==> Girl 8 2 1.41 1.32 10.14 Philosophy ==> Tony_And_Tina 9 2 1.41 1.32 18.41 Tony_And_Tina ==> Philosophy 10 2 2.96 1.32 18.88 Demeter_Fragrances ==> Smell_This 11 2 3.12 1.32 22.45 Girl ==> Tony_And_Tina 12 2 2.96 1.32 20.75 Smell_This ==> Demeter_Fragrances

  34. Data Mining - Classification • Understand customer via rules, tree etc • Prediction model for target-oriented marketing/campaign

  35. Data Mining - Clustering • Discover group/segments of similar behaviors/profile

  36. Questions ?

More Related