1 / 33

Warehousing on the Web

Learn about the benefits and capabilities of using the web for data warehousing, including managing clickstreams and bringing existing data warehouses online. Discover how web-based data warehousing can enhance customer insights and drive business growth.

ginata
Download Presentation

Warehousing on the Web

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Warehousing on the Web Webhouse

  2. Why Utilize the Web? • What is the data Webhouse • Managing clickstreams • WWW today • ROI • DSS

  3. Data Webhouse • Defined by Ralph Kimball • Two distict focuses • Bringing the web to the warehouse • Clickstream data as a source of information • Bringing existing data warehouses to web • Fully distributed environment

  4. Required Capabilities • Capture clickstream logs and convert to tables for analysis • Merge customer demographic and account info with above • Interpret customer paths in website • Identify abandoned sessions • Use dw to drive customer responses appearing on your website • DW querying and reporting available through web browsers • Attach multimedia to DW • DW security

  5. Architecture – Web to Warehouse • Beyond comprehensive snapshot of business on real-time basis also want knowledge of customer behavior • Extended design factors • Timliness – real-time • Data volume – no upper limit • Response time – less than 10 seconds

  6. Hot Response Cache • A file server holding complex file objects • As a file server it is an I/O engine (bandwidth) • Must hold objects which will be requested • Security responsibility of requesting server • Extension of original operational data store (ODS) • Does not physically speed up database creates illusion by storing predictable answers

  7. Who are our users? • Traditional • Power users • need database connectivity • Analysts • want to manipulate existing data • Report viewers • view standardized reports • Web • Our customers • Our business partners • Our employees

  8. Clickstreams • Clickstream not another data source • Distributed nature leads to multiple data sources which require synchronization • Multiple parties • More than a dozen log file formats for capturing clickstream data • Search specification • Basic form of clickstream data stateless • Log shows isolated page retrieval event • Clickstream data anonymous • Todays Promotions • Clickthroughs and referrals as a revenue source

  9. Clickstreams • Clickstream post-processor – receives raw long data from web server and normalizes it into a format which can be combined with application derived data for insertion into dw • Todays Promotions • Clickthroughs and referrals as a revenue source

  10. Why Bring DW to Web? • Primary function of dw to publish information – web good partner • Need distrnuted dw – web provides universal connectivity • Universal front-end – web browser

  11. Web Pushes Data Warehouse • User interface effectiveness measurable • Queries and updates mixed • Speed expected – 10 second rule • Global • 27 X 7 expected • International characters, dates, addresses • Expanded multimedia • Animation, zoomable images, maps, video clips • Need material in digital form • Enterprise information portal will require items to be searchable

  12. Web Pushes Data Warehouse • Mass customization • Dynamically created web pages – XML • Fully distributed • Linking together all the data marts • Security and Privacy • Publish only to those who need to know • User profiles and access profiles defined in one place • Full-time expert security person

  13. Second Generation User Interface Guidelines • Near- instantaneous performance • Website Design • Design for lowest common denominator • Measure page performance on a continuous basis • Paint navigation buttons immediately • Disclose content progressively • Implement page caching • Cache data, reports • Improve web server bandwidth • Improve server throughput

  14. Second Generation User Interface Guidelines • Data Webhouse design • Adapt all web design responses • Select appropriate DBMS software – dimensional models, OLAP • Use indexes, aggregations • Partition files • Increase RAM • Use parallel processing

  15. Meet User Expectations • Website design • Site navigation choices • Help choices • Communication with various groups – response must be assured • Headlines serious and define content • Indicate off-screen material • Survey customer needs and wants

  16. Meet User Expectations • Data Webhouse design • Report library • Folder of previous queries, reports … • Dimension browser – viewing dimension can assist report creation • Business metadata interface –understand organizations data assets

  17. Streamline Process • Business processes designed from ground up to work seamlessly on web • Website design • Reengineer to streamline process and make navigation easier, uniform interfaces • Remove barriers to reaching page • Minimize clicks and new windows • Allow interruption and return

  18. Streamline Process • Data Webhouse design • Build an explicit value chain for reporting and analysis around the application suite using conformed dimensions and facts • Drill across functions • Single user interface for reporting against all parts of business • Master report library and FAQs • Single login and single console access to webhouse

  19. Reassure Users • Website Design • Map of processes • Data Webhouse design • Provide status and lineage of current data • Provide status of running reports • Active notification • Allow for entry of NA if data not available • Time stamped dimensions • Time stamped reports

  20. Allow Problem Resolution • Website design • Allow backtracking, rollback, play forward • Keep old transactions • Easy error reporting • Acknowledge, track and follow-up all user inputs, show wait time • Assist searching • Data Webhouse design • Provide adequate end user support • Show aggregates in use and available • Show system load and percent completed

  21. Build Trust • Clearly state and observe website’s policies for using customer’s identity • Website design • Do not abuse privacy • Link to privacy statement • Use friendly pictures of people • Distinguish between ad content and editorial content

  22. Build Trust • Data Webhouse design • Two-factor security • What you know – password • What you posses – token • Track changes in employee and contractor status • Create and enforce roles for employees, contractors and customers • Manage webhouse security directly

  23. Provide Communication Hooks • Website design • Provide useful links to others – internal and external • Remove links that invalidate the “back” button • Use copyable URLs • Use URL as medium of distribution

  24. Advantages of Web Today 1998 2000 • Immediate worldwide access • Centralized management - Decentralized • Thin client • Multi-platform (client and server) - Distributed • Little or no software distribution - Downloads A+

  25. Disadvantages of Web Today 1998 2000 • Immature technology - Teenager • Security - Solutions • Speed restricted by bandwidth - data and logic must both travel across internet • Design limited to least common denominator or access restricted to specific browser

  26. Vulnerabilities • Physical assets • Information assets • theft • modification • Software assets • Ability to conduct business

  27. Application Application • Browser • Applets/ActiveX • Email • Spreadsheet • Word-processing Web Architecture Thin Client Communication layer (network/internet) Internet Server Analysis/ Graphics Report SQL statistics Writer Query OLAP Server Multidimensional Summary/Alternative Database Relational Tables Database Servers Data Warehouse - Relational Database

  28. Business Management through Information • Analysis of historical records • order processing, inventory levels, shipments, receivables, customer history, etc. • Goals include: • Measures of efficiency • Anticipate changes (planning and forecasting) • Make adjustments • Integration of model and control function

  29. Rule-Based Management • Create Strategic rules • IF market demand increases THEN implement marketing campaign A3 • IF profit margin drops below value X THEN adjust overhead by … • Must not forget alert rules • If unanticipated condition, then notify CFO • Must not be too reactive • would cause thrashing

  30. OLDM Decision Process • Simultaneous capture of: • Decision support information • Surveyed customer on-line in exchange for an additional discount • with business function inputs • Immediate computation or estimation of secondary information • based on planning and forecasting rules • Decision support information is: • available on-line • ready to use “as is” Management Defined !

  31. OLDM Decision Process • Derived data becomes control information • Automation of analysis and decision support • immediately available to management • Problems documented on-line • Classes of problem and corrective action codified • problem recognition • decision rules

  32. OLDM Decision Process • Requires four types of information • Characteristics which identify a class of problem • Corrective action ( management responses by problem class) • Rules to implement actions • Record of result

  33. Potential of OLDM • Better managed business • knowledge asset capture and retention • consistency across enterprise • flexible, highly responsive • Close loop with customer • event and market driven but controlled • Direct customer interaction • via web, telephone, remote connection • Improved systems capacity planning and system management • Re-alignment of business and IT

More Related