1 / 21

Understanding Data Analytics and Data Mining

Understanding Data Analytics and Data Mining. Introduction. Introduction.

dworth
Download Presentation

Understanding Data Analytics and Data Mining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Understanding Data Analytics and Data Mining Introduction

  2. Introduction An important aspect of the decision-making process is the ability to transform seemingly unrelated data into useful information which is used to influence a person’s decision. Understanding what data is needed to make effective decisions and where that data comes from is just one step in the process: the next step is mining or analyzing that data to draw up useful conclusions to aid in decision making. The Understanding Data Analysis and Data Mining presentation is designed to explore the general principles behind this second step and support the organization in understanding their options related to using data effectively in their business.

  3. Distinguishing Analysis and Mining The terms, “data analysis” and “data mining,” are sometimes used interchangeably, but they are distinctly different in practice. In data analysis, a hypothesis is formed and the data is analyzed to support or disprove the hypothesis. In data mining, no hypothesis is formed initially but the data is analyzed to identify any interesting patterns from which a hypothesis can be drawn. Despite their differences, the techniques and methods for both data analysis and data mining are similar.

  4. Knowledge Discovery in Databases The Knowledge Discovery in Databases process includes the following steps: • Selection • Preprocessing • Transformation • Data Mining • Interpretation/Evaluation • Knowledge Presentation

  5. Defining Data Data are a set of facts. Facts are true or proven. Data can come in a variety of types: • Relational data • Operational data • Transactional data

  6. Define Data Entry A data entry is a single instance or record in a database. They are also called data objects. A data entry establishes relationship between data elements. • person and address • customers and purchases • events and outcomes

  7. Define Dimensions A dimension is a collection of facts about a measurable situation. Dimensions define the who, what, where, when, and how of a particular focus on the data. Dimensions are used to construct how data patterns are identified and analyzed.

  8. Dimensions – Cube Schema The cube rendering is a product of online analytical processing (OLAP) and is used to show how the different dimensions of data can be viewed. Retail Example: • 4 retail locations • 10 products • 12 months • 2 age groups Location Time Product

  9. Dimensions – Star Schema Star schemas are used to design how data is organized in data warehouses. Product Location Orders Time Customer

  10. Online Analytical Processing Online Analytical Processing is an approach for analyzing multidimensional data from multiple perspectives interactively. The acronym for online analytical processing is OLAP.

  11. Defining Patterns A pattern is an expression of data which can be modeled. Data analysis and data mining focuses on identifying, understanding, and drawing conclusions about interesting patterns. An interesting pattern has the following characteristics: • It can be understood easily by humans • It can be recreated, meaning it has some level certainty to its validity • It can be potentially used by the organization • It is novel, innovative, and requires investigation • For data analysis, it validates and confirms the hypothesis

  12. Queries Queries are a mechanism for retrieving information from a database: they consist of questions. Standard queries are predefined questions to ask a database.

  13. Data Mining Techniques There are several techniques of note in data mining: • Characterization and Discrimination • Associations and Correlations • Classification and regression • Clustering analysis • Outlier analysis

  14. Characterization and Discrimination Characterization will describe the data in summary or general terms. Discrimination will describe the data, usually by means of comparison.

  15. Association and Correlation Associations and correlations are pattern relationships made against data objects. Often used in frequent pattern mining.

  16. Classification and Regression Classification attempts to find a predefined data model to describe the data set. Regression attempts to find an existing data model to describe missing or unavailable numerical data sets. These are predictive approaches and utilize methods such as decision trees and neural networks.

  17. Cluster Analysis Data objects are analyzed without using class labels, or generating class labels. Image from visibleearth.nasa.gov

  18. Outlier Analysis Looks at the abnormalities in data: data that does not behave as expected.

  19. Standards Cross Industry Standard Process for Data Mining (CRISP-DM) was developed by the European Strategic Program on Research in Information Technology Sample, Explore, Modify, Model, and Assess (SEMMA) was developed by SAS Institute Inc.

  20. The Toolkit The Toolkit is designed to enable an organization to improve their capabilities in data warehousing and data analysis, while maintaining a level of neutrality between specific technical solutions. The toolkit is comprised of two parts: an introduction to the concepts and terms used in these areas, and usable templates to pursue and implement specific technical solutions The goal of the Data Warehouse and Data Analysis Toolkit is to define the contributing factors, major components, and their relationships, while provide the basic tools to take action based on the organization’s needs.

  21. Moving Forward The presentations found within the Toolkit provide education about the different facets of Data Warehousing and Data Analysis: they can be used for self-edification or as the foundation for presenting a case to different levels of the organization. The process document, Developing Data Analysis Capabilities, is intended to be a step-by-step guide in creating a Data Analysis foundation in your organizations. Multiple templates have been created to support the process and aid organizations in their efforts to improve their Data Analysis capabilities.

More Related