An Overview of Data Mining Techniques

An Overview of Data Mining Techniques

Data and Regular Statistics • Given data of high integrity (large volumes of records with clean and complete attributes • Regular statistics can already produce conclusions • Mean • Mode • Standard deviation • Frequency analysis (how many px had pneumonia from 89-92 in Region 1) • Numerous visualizations can already be made • Bar graphs • Trend graphs • Pie charts

Data Mining • Data Mining is the processes of deriving information out of a very large volume of data • Techniques go beyond basic statistical methods • A notion of “learning” is implied • Data that seem to be unrelated may appear to have correlation • Climate condition correlated with probability that a civil war will occur • Shopping behavior correlated with risk of injury

Data mining techniques • K – Nearest Neighbor • K means clustering

k-Nearest Neighbor • You already have voluminous data of multiple cases/records that are properly classified • You have a new case that is not yet part of your multiple data • K-NN can determine the classification of this new case

k-NN Process • Specify k • Select a good metric • Compute distances for each column • Add all column distances for each row • Determine k nearest neighbors and relative weights • Make prediction

k-NN Example • Sample credit risk data • How would Maria who is single, high-income earner, and low in debt be classified?

k-NN Example (continued) • Assume k=3 • Metric • Compare columns, 0-same, 1-different • Get total for all columns

k-NN Example (continued) • Maria’s nearest neighbors: • Harry (0, Poor) • Amber (1, Good) • Kaley (2, Poor) • Joe (2, Good) • Maria is predicted as a “Poor” risk!

K-NN Applications • Could be applied to records of patients (i.e. cancer) • Example of attributes for cancer data: • Pathological findings • Radiological findings • Lab results • Surgical notes • Etc. • Predisposition (risk) for cancer may be determined

An Overview of Data Mining Techniques

An Overview of Data Mining Techniques

Presentation Transcript

Data Mining Techniques Clustering

Data Mining: An Introductory Overview

Data Mining Techniques

CS6220: Data Mining Techniques

Basic Data Mining Techniques

An Overview and Example of Data Mining

Overview of Data Mining

Data warehousing and Data mining – an overview

Basic Data Mining Techniques

Overview of Data Mining

Data Mining Techniques

Big Data Visualization Techniques: An Overview

Data Mining: An Introductory Overview

Data Mining Techniques

Overview of Data Mining