section 1 1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Section 1.1 PowerPoint Presentation
Download Presentation
Section 1.1

Loading in 2 Seconds...

play fullscreen
1 / 27

Section 1.1 - PowerPoint PPT Presentation


  • 83 Views
  • Uploaded on

Section 1.1 . Background. Objectives. Discuss some of the history of data mining. Define data mining and its uses. Defining Characteristics. 1. The Data Massive, operational, and opportunistic 2. The Users and Sponsors Business decision support 3. The Methodology

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Section 1.1' - isra


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
section 1 1

Section 1.1

Background

objectives
Objectives
  • Discuss some of the history of data mining.
  • Define data mining and its uses.
defining characteristics
Defining Characteristics
  • 1. The Data
      • Massive, operational, and opportunistic
  • 2. The Users and Sponsors
      • Business decision support
  • 3. The Methodology
      • Computer-intensive “ad hockery”
      • Multidisciplinary lineage
data mining circa 1963
Data Mining, circa 1963

IBM 7090

600 cases

“Machine storage limitations

restricted the total number of

variables which could be

considered at one time to 25.”

since 1963
Since 1963
  • Moore’s Law:
  • The information density on silicon-integrated circuits doubles every 18 to 24 months.
  • Parkinson’s Law:
  • Work expands to fill the time available for its completion.
slide6

Data Deluge

hospital patient registries

electronic point-of-sale data

remote sensing images tax returns

stock trades OLTP telephone calls

airline reservations credit card charges

catalog orders bank transactions

the data
The Data

ExperimentalOpportunistic

Purpose Research Operational

Value Scientific Commercial

Generation Actively Passively

controlled observed

Size Small Massive

Hygiene Clean Dirty

State Static Dynamic

business decision support
Business Decision Support
  • Database Marketing
    • Target marketing
    • Customer relationship management
  • Credit Risk Management
    • Credit scoring
  • Fraud Detection
  • Healthcare Informatics
    • Clinical decision support
multidisciplinary
Multidisciplinary

Statistics

Pattern

Recognition

Neurocomputing

Machine

Learning

AI

Data Mining

Databases

KDD

tower of babel
Tower of Babel
  • “Bias”

STATISTICS: the expected difference between an estimator and what is being estimated

NEUROCOMPUTING: the constant term in a linear combination

MACHINE LEARNING: a reason for favoring

any model that does not fit the data perfectly

steps in data mining analysis
Steps in Data Mining/Analysis
  • 1. Specific Objectives
      • In terms of the subject matter
  • 2. Translation into Analytical Methods
  • 3. Data Examination
      • Data capacity
      • Preliminary results
  • 4. Refinement and Reformulation
required expertise
Required Expertise
  • Domain
  • Data
  • Analytical Methods
nuggets
Nuggets

“If you’ve got terabytes of data, and

you’re relying on

data mining to find

interesting things

in there for you,

you’ve lost before

you’ve even begun.”

— Herb Edelstein

what is data mining
What Is Data Mining?
  • IT
    • Complicated database queries
  • ML
    • Inductive learning from examples
  • Stat
    • What we were taught not to do
problem translation
Problem Translation
  • Predictive Modeling
    • Supervised classification
  • Cluster Analysis
  • Association Rules
  • Something Else
predictive modeling
Predictive Modeling

Inputs

Target

...

...

...

...

...

...

Cases

...

...

...

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

...

...

types of targets
Types of Targets
  • Supervised Classification
    • Event/no event (binary target)
    • Class label (multiclass problem)
  • Regression
    • Continuous outcome
  • Survival Analysis
    • Time-to-event (possibly censored)
objectives1
Objectives
  • Define SEMMA.
  • Introduce the tools available in Enterprise Miner.
semma
SEMMA
  • Sample
  • Explore
  • Modify
  • Model
  • Assess
sample

Input Data Source

Sampling

Data Partition

Sample
explore
Explore

Distribution

Explorer

Multiplot

Insight

Association

Variable Selection

Link Analysis

modify

Data Set

Attributes

Transform

Variables

Filter

Outliers

Replacement

Clustering

SOM/Kohonen

Time Series

Modify
model

Regression

Tree

Neural Network

Princomp/

Dmneural

User Defined

Model

Ensemble

Memory Based

Reasoning

Two-Stage Model

Model
other types of nodes utility nodes

Group Processing

Data Mining Database

SAS Code

Control Point

Subdiagram

Other Types of Nodes – Utility Nodes