part ii
Download
Skip this Video
Download Presentation
Part II

Loading in 2 Seconds...

play fullscreen
1 / 22

Part II - PowerPoint PPT Presentation


  • 61 Views
  • Uploaded on

Part II. Ch 5. Knowledge Discovery in Databases Ch 6. The Data Warehouse Ch 7. Formal Evaluation Technique. Tools for Knowledge Discovery. Knowledge Discovery in Databases. Chapter 5. 5.1 A KDD Process Model. 1. Goal identification 2. Creating an initial target data set

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Part II' - zanna


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
part ii

Part II

Ch 5. Knowledge Discovery in Databases

Ch 6. The Data Warehouse

Ch 7. Formal Evaluation Technique

Tools for Knowledge Discovery

5 1 a kdd process model
5.1 A KDD Process Model
  • 1. Goal identification
  • 2. Creating an initial target data set
  • Data preprocessing
  • Data transformation
  • Data mining
  • Interpretation and evaluation
  • Taking action
step 1 goal identification

Step 1: Goal Identification

Define the Problem.

Choose a Data Mining Tool.

Estimate Project Cost.

Estimate Project Completion Time.

Address Legal Issues.

Develop a Maintenance Plan.

step 2 creating a target dataset
Step 2: Creating a Target Dataset

(1) Flat file or Spread Sheet format

(2) Relational Database

Collection of tables (rows and columns)

RDB  Reduce data redundancy (Decomposition)

DM  Uncover the inherent redundancy in data

(Join is required)

(3) (1) + (2)  Data transformation is required.

(4) Data Warehouse : Historical database

designed specifically for decision support.

step 3 data preprocessing

Noisy Data

Step 3: Data Preprocessing

Missing Data

Locate Duplicate Records.

Locate Incorrect Attribute Values.

Outliers

  • Discard records with missing values.
  • Replace missing real-valued items with the class mean.
  • Replace missing values with values found within highly
  • similar instances.
step 4 data transformation

Step 4: Data Transformation

Data Normalization

(2) Data Type Conversion

(3) Attribute and Instance Selection

1 data normalization

(1) Data Normalization

Decimal Scaling

Min-Max Normalization

Normalization using Z-scores

Logarithmic Normalization

(2) Data Conversion

Categorical  Numeric equivalent

3 attribute and instance selection

(3) Attribute and Instance Selection

a. Eliminating Attributes

1. Highly correlated

2. High domain predictability

3. Low attribute significance score

slide13

Creating Attributes

  • - Combining attributes e.g. P/E ratio
  • - Differences between the attributes
  • - Percent increase or decrease
  • Instance Selection
  • - Use Instance Typicality
  • Supervised : Use Highly and moderately typical
  • training instance
  • Unsupervised : Eliminate most atypical instance
  •  well defined cluster
step 5 data mining

Step 5: Data Mining

Choose training and test data.

Designate a set of input attributes.

If learning is supervised, choose one or more output attributes.

Select learning parameter values.

Invoke the data mining tool.

step 6 interpretation and evaluation

Step 6: Interpretation and Evaluation

Statistical analysis.

Heuristic analysis.

Experimental analysis.

Human analysis.

step 7 taking action

Step 7: Taking Action

Create a report.

Relocate retail items.

Mail promotional information.

Detect fraud.

Fund new research.

5 9 the crisp dm process model cross industry standard process for data mining

5.9 The Crisp-DM Process Model(Cross Industry Standard Process for Data Mining)

Business understanding

Data understanding

Data preparation

Modeling

Evaluation

Deployment

5 10 experimenting with esx
5.10 Experimenting with ESX

A Four-Step Model for Knowledge Discovery

  • Identify the goal.
  • Prepare the data.
  • Apply data mining.
  • Interpret and evaluate the results.
experiment 1 attribute evaluation

Experiment 1: Attribute Evaluation

Use unsupervised clustering to see how well the set of input attributes are able to define the classes

- Domain Summary: eight, eleven/ nine, ten, twelve

- Class Summary: /nine, ten

Class 1 : Accept 84%

Class 2 : Reject 90%

Repeat DM process to create a best data model

*Applying the Four-Step Process Model to the Credit Screening Dataset*

experiment 2 parameter evaluation

Experiment 2: Parameter Evaluation

*Applying the Four-Step Process Model to the Satellite Image Dataset*

ad