1 / 47

MD 240 Data Management: Warehousing, Analyzing, Mining and Visualization

Agenda. BackgroundData ManagementData CollectionData Cleaning, Preparation

nyx
Download Presentation

MD 240 Data Management: Warehousing, Analyzing, Mining and Visualization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. MD 240 Data Management: Warehousing, Analyzing, Mining and Visualization

    2. Agenda Background Data Management Data Collection Data Cleaning, Preparation & Warehousing Data Analysis Visual Methods for Discovery & Presentation Marketing Transaction Databases

    3. Background Until recently, it was difficult for analysts and managers to perform analyses related to their business activities With the spread of PCs and networked devices … it has become easier than ever to collect data about activities in an organization it has become more feasible to transform analysis from a task of the statistician in the back office to salespeople, managers, and analysts closer to the front office

    4. Background Difficulties with data analysis for business intelligence Data amount increasing exponentially Multiple sources of data … increasing all the time Only a small portion of the total data collected are usually useful for making a decision Increasing need for external data Differing legal requirements about data collection in different countries Selection of data management tool from the many available tools Data security, quality, integrity, etc.

    5. Data Management

    6. Data Management Data Management Process Data Life Cycle Process Data collection Data stored in databases Pre-process databases Clean out junk Get data close to what decision-makers need Transformation of data Make it ready for analysis Store in data warehouse Use data mining tools to discover patterns Create knowledge Presentation of results

    7. Data Management Data Management Process

    8. Data Management Data Management Process

    9. Data Collection

    10. Step #1: Data Collection Data Sources

    11. Step #1: Data Collection Data Strategy Fundamental philosophy guiding data collection GIGO: “garbage in, garbage out”

    12. Step #1: Data Collection Data Sources Internal data data/info. about organizational activities Personal data data/info. documenting employees’ activities External data government, competitors, suppliers The Internet “screen scraping” data out of the browser Commercial database services Online databases

    13. Step #1: Data Collection Data Capture and Input Past Type in by hand time consuming costly many typing errors Now Objective is to automate save paper storage costs of leasing warehouses faster access to documents and information in documents Document Management Systems scanners for digitizing archived paper documents databases for archiving, search, retrieval

    14. Step #1: Data Collection Data Quality (DQ) Intrinsic DQ: Accuracy, objectivity, believability, and reputation Accessibility DQ: Accessibility and access security Contextual DQ: Relevance, value added, timeliness, completeness Representation DQ: Interpretability, ease of understanding, concise representation, and consistent representation

    15. Data Cleaning, Preparation & Warehousing

    16. Steps #2-#5: Data Warehousing Transactional Processing Store data in databases Objectives of TPS Standardized transactions Simple computations non-complex not very mathematical or statistically oriented High volume Low cost

    17. Steps #2-#5: Data Warehousing Transaction vs. Analytical Processing Task objectives for a useful analytical data delivery system Easy data access by end users Quicker decision making Accurate and effective decision making Flexible decision making

    18. Steps #2-#5: Data Warehousing Transaction vs. Analytical Processing Characteristics of a useful analytical data delivery system Business representation of data for end users Client-server or Web-based environment that provides end users with query and reporting capability Server-based repository (data warehouse)

    19. Steps #2-#5: Data Warehousing Data Warehouse and Data Marts Data Warehouse establishes a data repository, that ... makes operational data accessible in a form readily acceptable for analytical processing activities “Metadata”: data summaries for faster indexing and searching within data warehouse data summaries information on how the data have been organized Data Mart dedicated to a functional area, or ... dedicated to a regional area

    20. Steps #2-#5: Data Warehousing Data Warehouse and Data Marts

    21. Steps #2-#5: Data Warehousing Characteristics of Data Warehousing Desirable Characteristics for a Data Warehouse Organization organized by subject; extraneous items removed Consistency identical measurement and representation of same data Time variant varies over time; “time-series” data Nonvolatile data are not updated once entered Relational table-based structure (RDBMS)

    22. Steps #2-#5: Data Warehousing Characteristics of Data Warehousing Data Warehousing is most suitable for organizations in which … End users need to access large amounts of data Operational data are stored in several different systems Different systems represent the same data in different formats Management relies on information for decision making There is a large, diverse customer base Extensive end-user computing is performed

    23. Data Analysis

    24. Step #6: Data Analysis Knowledge Discovery in Databases (KDD) Foundations of KDD Massive data collection Powerful multiprocessor computers “Intelligent” data mining algorithms Analyst/manager activities Ad-Hoc Queries OLAP Queries Data Mining

    25. Step #6: Data Analysis Ad Hoc Queries Ad Hoc Queries Let users access, navigate, and explore data in real time to make business decisions Ad hoc query tool requirements Query creation is easy Customized query creation Easy to use interfaces for performing queries Many data sources are supported Seamless integration between analysis and reporting

    26. Step #6: Data Analysis OLAP Queries OLAP An approach by which important queries and calculations are turned into online tools that managers can use over and over again Decision support software that allows the user to quickly analyze information that has been summarized into multidimensional views and hierarchies MOLAP … multidimensional OLAP ROLAP … OLAP using relational databases WOLAP … web-based OLAP

    27. Step #6: Data Analysis OLAP Queries Capabilities of Online Analytical Processing (OLAP) Access very large amounts of data Analyze the relationships between many types of business elements Involve aggregated data Compare aggregated data over hierarchical time periods Present data in different perspectives Involve complex calculations between data elements Able to respond quickly to user requests

    28. Step #6: Data Analysis OLAP Queries OLAP Advantages Adapt existing decision making tools to the WWW, integrate them with distributed data stores facilitates “drill-down” OLAP Shortcomings Retrospective in nature More of a reporting-oriented tool A discovery-oriented tool for flexible data analysis of data already known to have importance Less of a prediction-oriented tool

    29. Step #6: Data Analysis Data Mining Objectives of Data Mining Automate discovery of previously unknown patterns Automate prediction of trends behaviors events

    30. Step #6: Data Analysis Data Mining Nature and Characteristics Data often buried deep within large databases “Data wants to be Free!” Data may be consolidated in data warehouse or kept in internet and intranet servers Usually client-server architecture

    31. Step #6: Data Analysis Data Mining Nature and Characteristics (cont’d) Data mining tools extract information buried in corporate files or archived public records The “miner” is often an end user “Striking it rich” usually involves finding unexpected, valuable results Parallel processing computers often needed to make this analysis fast enough to be useful to manager

    32. Step #6: Data Analysis Data Mining Common types of data mining Mining of numerical data Text mining … group documents or identify themes or information within documents Documents Web pages Web site clickstream/event mining

    33. Step #6: Data Analysis Data Mining Data Mining yields five types of information Association e.g., correlation = 0.5; slope between X and Y = 0.73 Sequences e.g., biggest, second biggest, etc. Classifications e.g., There are 3 types of competitors, use data mining to classify Firm X as a “Type 1” competitor Clusters e.g., We don’t know how many types of customers there are … let’s try to discover if we can identify some similar customer groups Forecasting

    34. Step #6: Data Analysis Data Mining Techniques/Tools Computer Science Case-based reasoning Neural computing Intelligent agents Others: decision trees, genetic algorithms, nearest neighbor method, and rule reduction Statistics Cluster analysis Most standard statistical tools (SAS, SPSS) Optimization

    35. Step #6: Data Analysis Data Mining Techniques/Tools

    36. Step #6: Data Analysis Data Mining Vendors Vendors SAS Enterprise Miner SPSS Business Intelligence Insightful (www.insightful.com) Microsoft Research IBM Blue Martini Amdocs DBMiner (www.dbminer.com) PrudSys (www.prudsys.de) Boston Area … Torrent (www.torrent.com), ThinkAnalytics (www.thinkanalytics.com) Learning Resources Association of Computing Machinery (ACM) SIGKDD KDD2002 conference (July 2002)

    37. Visual Methods for Discovery & Presentation

    38. Steps #6&7: Data Visualization Multidimensionality Multidimensionality “real-world” data typically have more than 2 or 3 dimensions managerial analyses may require presentation of up to 7 or 8 dimensions to fully communicate discoveries Three factors dimensions measures time Solution: technology that is flexible enough so that data can be organized the way managers prefer to see the data

    40. Steps #6&7: Data Visualization Presenting Multidimensional Data Data visualization involves presentation of data by digital technology graphical user interfaces digital images geographical information systems multidimensional tables and graphs virtual reality three-dimensional presentations animation

    41. Steps #6&7: Data Visualization Presenting Multidimensional Data Low Tech Solutions … for a few dimensions Multidimensional Tables reduce many dimensions down to 2D table format “Slicing and Dicing” Data “rotation” ability to easily switch the 3 variables being analyzed and rotate 3D graphs on a computer screen High Tech Solutions … for many dimensions See Edward Tufte’s books The Visual Display of Quantitative Information Envisioning Information Visual Explanations

    42. Steps #6&7: Data Visualization Geographical Information Systems (GIS) GIS A computer-based system for capturing, storing, checking, integrating, manipulating, and displaying data using digitized maps. Plot data or present data analysis findings by … latitude and longitude cities, major metropolitan areas counties states nations

    43. Steps #6&7: Data Visualization Geographical Information Systems (GIS) Emerging GIS Applications Sophisticated user interfaces Multimedia, 3D graphics, animated and interactive maps Integration of GIS and GPS Reengineer aviation and shipping industries Intelligent GIS (integration of GIS and ES) Hand-held applications Deploy mapping tools to PDAs and Java-based cell phones Web applications ESRI’s ArcData GIS

    44. Steps #6&7: Data Visualization Geographical Information Systems (GIS) Vendors ESRI (www.esri.com) Arc/Info ArcData Online (www.esri.com/data/online/index.html) Resources www.gis.com www.gisday.com www.state.ma.us/mgis/ www.northeastarc.org

    45. Steps #6&7: Data Visualization Other Visualization Tools Visual Interactive Modeling visual modeling of a system Visual Interactive Simulation a visual front end to a simulation program presents animation of system activities and statistical results during a simulation run Real-time simulation … users can interact with the simulation model (prototyping, training, entertainment, video games) Virtual Reality Fake environments that attempt to fool the viewer into perceiving that they are within a 3D world Usually involves a headset, gloves, and other forms of sensory input/output devices

    46. Marketing Transaction Databases

    47. Application Area: Marketing Marketing Transaction Database (MTD) … a new kind of database, oriented toward targeting and personalizing marketing messages in real time.

    48. Application Area: Marketing Marketing Transaction Database (MTD) Purpose: targeting and personalization Structure: liquid - driven by real-time marketing Updates: real-time Data level: individual detail Data type: demographic (descriptive), behavioral, derivative Advantages: allows real-time analysis and decision-making, CRM Issues: emerging, no standards, not integrated with other systems

More Related