SHIM 413
This presentation is the property of its rightful owner.
Sponsored Links
1 / 28

SHIM 413 Database Applications for Healthcare Fall 2006 PowerPoint PPT Presentation


  • 65 Views
  • Uploaded on
  • Presentation posted in: General

SHIM 413 Database Applications for Healthcare Fall 2006. Slides by H. T. Bao. Outline of the presentation. Objectives, Prerequisite and Content. Objectives, Prerequisite and Content. Brief Introduction to Lectures. Discussion and Conclusion .

Download Presentation

SHIM 413 Database Applications for Healthcare Fall 2006

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Shim 413 database applications for healthcare fall 2006

SHIM 413

Database Applications for Healthcare

Fall 2006

Slides by H. T. Bao


Shim 413 database applications for healthcare fall 2006

Outline of the presentation

Objectives,

Prerequisite

and Content

Objectives,

Prerequisite

and Content

Brief Introduction

to Lectures

Discussion

and

Conclusion

This presentation summarizes the content and organization

of lectures in module “Knowledge Discovery and Data Mining”


Objectives

Objectives

This course provides:

  • fundamental techniques of knowledge discovery and data mining (KDD)

  • issues in KDD practical use and tools

  • case-studies of KDD applications in medical domain (healthcare)


Shim 413 database applications for healthcare fall 2006

Prerequisite for the course

Nothing special but the followings are expected:

  • experience of computer use

  • basis of databases and statistics

  • programming skills on advanced levels


Shim 413 database applications for healthcare fall 2006

Content of the course

Lecture 1: Overview of KDD

Lecture 2: Preparing data

Lecture 3: Decision tree induction

Lecture 4: Mining association rules

Lecture 5: Automatic cluster detection

Lecture 6: Artificial neural networks

Lecture 7: Evaluation of discovered knowledge


Shim 413 database applications for healthcare fall 2006

Outline of the presentation

Objectives,

Prerequisite

and Content

Brief Introduction

to Lectures

Discussion

and

Conclusion

This presentation summarizes the content and organization

of lectures on the “Knowledge Discovery and Data Mining” topic


Shim 413 database applications for healthcare fall 2006

Brief introduction to lectures

Lecture 1: Overview of KDD

Lecture 2: Preparing data

Lecture 3: Decision tree induction

Lecture 4: Mining association rules

Lecture 5: Automatic cluster detection

Lecture 6: Artificial neural networks

Lecture 7: Evaluation of discovered knowledge


Shim 413 database applications for healthcare fall 2006

Lecture 1: Overview of KDD

1. What is KDD and Why ?

2. The KDD Process

3. KDD Applications

4. Data Mining Methods

5. Challenges for KDD


Shim 413 database applications for healthcare fall 2006

KDD: A Definition

KDD is the automatic extraction of non-obvious,

hidden knowledge from large volumes of data.

KDD is the automatic extraction of non-obvious,

hidden knowledge from large volumes of data.

106-1012 bytes:

never see the whole

data set or put it in the

memory of computers

What knowledge?

How to represent

and use it?

Data mining

algorithms?


Shim 413 database applications for healthcare fall 2006

Data, Information, Knowledge

We often see data as a string of bits, or numbers and symbols, or “objects” which we collect daily.

Information is data stripped of redundancy, and reduced to the minimum necessary to characterize the data.

Knowledge is integrated information, including facts and their relations, which have been perceived, discovered, or learned as our “mental pictures”.

Knowledge can be considered data at

a high level of abstraction and generalization.


Shim 413 database applications for healthcare fall 2006

From Data to Knowledge

Medical Data by Dr. Tsumoto, Tokyo Med. & Dent. Univ., 38 attributes

...

10, M, 0, 10, 10, 0, 0, 0, SUBACUTE, 37, 2, 1, 0,15,-,-, 6000, 2, 0, abnormal, abnormal,-, 2852, 2148, 712, 97, 49, F,-,multiple,,2137, negative, n, n, ABSCESS,VIRUS

12, M, 0, 5, 5, 0, 0, 0, ACUTE, 38.5, 2, 1, 0,15, -,-, 10700,4,0,normal, abnormal, +, 1080, 680, 400, 71, 59, F,-,ABPC+CZX,, 70, negative, n, n, n, BACTERIA, BACTERIA

15, M, 0, 3, 2, 3, 0, 0, ACUTE, 39.3, 3, 1, 0,15, -, -, 6000, 0,0, normal, abnormal, +, 1124, 622, 502, 47, 63, F, -,FMOX+AMK, , 48, negative, n, n, n, BACTE(E), BACTERIA

16, M, 0, 32, 32, 0, 0, 0, SUBACUTE, 38, 2, 0, 0, 15, -, +, 12600, 4, 0,abnormal, abnormal, +, 41, 39, 2, 44, 57, F, -, ABPC+CZX, ?, ? ,negative, ?, n, n, ABSCESS, VIRUS

...

Numerical attribute categorical attribute missing values class labels

IF cell_poly <= 220 AND Risk = n AND Loc_dat = + AND Nausea > 15

THEN Prediction = VIRUS [87,5%]

[confidence, predictive accuracy]


Shim 413 database applications for healthcare fall 2006

Data Rich Knowledge Poor

How to acquire knowledge for

knowledge-based systems

remains as the main difficult

and crucial problem.

People gathered and stored so much data because they think some valuable assets

are implicitly coded within it.

?

knowledge

base

inference

engine

Rawdata is rarely of direct benefit.

Its true value depends on the ability to extract information useful for decision support.

Tradition: via knowledge engineers

Impractical Manual Data Analysis

New trend: via automatic programs


Shim 413 database applications for healthcare fall 2006

Benefits of Knowledge Discovery

Value

Disseminate

DSS

Generate

MIS

EDP

Rapid Response

Volume

EDP: Electronic Data Processing

MIS: Management Information Systems

DSS: Decision Support Systems


Shim 413 database applications for healthcare fall 2006

Lecture 1: Overview of KDD

1. What is KDD and Why ?

2. The KDD Process

3. KDD Applications

4. Data Mining Methods

5. Challenges for KDD


Shim 413 database applications for healthcare fall 2006

Multiple process

non-trivial process

Justified patterns/models

valid

novel

Previously unknown

useful

Can be used

understandable

by human and machine

The KDD process

The non-trivial process of identifying valid, novel, potentially useful, and ultimately understandablepatterns in data - Fayyad, Platetsky-Shapiro, Smyth (1996)


Shim 413 database applications for healthcare fall 2006

Understand the domain and Define problems

Collect and

Preprocess Data

Data Mining

Extract Patterns/Models

Interpret and Evaluate discovered knowledge

Putting the results

in practical use

The Knowledge Discovery Process

5

a step in the KDD process consisting of methods that produce useful patterns or models from the data, under some acceptable computational efficiency limitations

4

3

2

1

KDD is inherently

interactive and iterative


Shim 413 database applications for healthcare fall 2006

The KDD Process

Data organized by function

Create/select

target database

Data warehousing

1

Select sampling

technique and

sample data

Supply missing

values

Eliminate

noisy data

2

Normalize

values

Transform

values

Create derived

attributes

Find important

attributes &

value ranges

4

3

Select DM

task (s)

Select DM

method (s)

Extract

knowledge

Test

knowledge

Refine

knowledge

Query & report generation

Aggregation & sequences

Advanced methods

Transform to

different

representation

5


Shim 413 database applications for healthcare fall 2006

Main Contributing Areas of KDD

Statistics

[data warehouses:

integrated data]

Infer info from data

(deduction & induction, mainly numeric data)

[OLAP: On-Line

Analytical Processing]

KDD

Databases

Machine Learning

Store, access, search, update data (deduction)

Computer algorithms that improve automatically through experience (mainly induction, symbolic data)


Shim 413 database applications for healthcare fall 2006

Lecture 1: Overview of KDD

1. What is KDD and Why ?

2. The KDD Process

3. KDD Applications

4. Data Mining Methods

5. Challenges for KDD


Shim 413 database applications for healthcare fall 2006

Potential Applications

Manufacturing information

Business information

- Marketing and sales

data analysis

- Investment analysis

- Loan approval

- Fraud detection

- etc.

- Controlling and scheduling

- Network management

- Experiment result analysis

- etc.

Personal information

Scientific information

- Sky survey cataloging

- Biosequence Databases

- Geosciences: Quakefinder

- etc.


Shim 413 database applications for healthcare fall 2006

KDD: Opportunity and Challenges

Competitive Pressure

Data Rich

Knowledge Poor

(the resource)

KDD

Data Mining

Technology

Mature

Enabling Technology

(Interactive MIS, OLAP,

parallel computing, Web, etc.)


Shim 413 database applications for healthcare fall 2006

KDD: A New and Fast Growing Area

KDD workshops: since 1989.

Inter. Conferences: KDD (USA), first in 1995;

PAKDD (Asia), first in 1997; PKDD (Europe), first in 1997.

ML’04/PKDD’04 (in Pisa, Italy)

Industry interests and competition: IBM, Microsoft,

Silicon Graphics, Sun, Boeing, NASA, SAS, SPSS, …

About 80% of the Fortune 500 companies are involved in

data mining projects or using data mining systems.

JAPAN: FGCS Project (logic programming and reasoning).

“Knowledge Discovery is the most desirable end-product of computing”. Wiederhold, Standford Univ.


Shim 413 database applications for healthcare fall 2006

Lecture 1: Overview of KDD

1. What is KDD and Why ?

2. The KDD Process

3. KDD Applications

4. Data Mining Methods

5. Challenges for KDD


Shim 413 database applications for healthcare fall 2006

Primary Tasks of Data Mining

finding the description

of several predefined

classes and classify

a data item into one

of them.

identifying a finite

set of categories or

clusters to describe

the data.

Clustering

Classification

finding a model

which describes

significant dependencies

between variables.

maps a data item

to a real-valued

prediction variable.

Regression

Dependency

Modeling

discovering the

most significant

changes in the data

finding a

compact description

for a subset of data

Deviation and

change detection

Summarization


Shim 413 database applications for healthcare fall 2006

Classification

“What factors determine cancerous cells?”

Examples

General

patterns

Data

Mining

Algorithm

- Rule Induction

- Decision tree

- Neural Network

Classification

Algorithm

Cancerous Cell Data


Shim 413 database applications for healthcare fall 2006

Classification: Rule Induction

“What factors determine a cell is cancerous?”

If Color = light

and Tails = 1

and Nuclei = 2

ThenHealthy Cell(certainty = 92%)

If Color = dark

and Tails = 2

and Nuclei = 2

ThenCancerous Cell(certainty = 87%)


Shim 413 database applications for healthcare fall 2006

Classification: Decision Trees

Color = dark

Color = light

#nuclei=1

#nuclei=2

#nuclei=1

#nuclei=2

cancerous

healthy

#tails=1

#tails=2

#tails=1

#tails=2

healthy

cancerous

healthy

cancerous


Shim 413 database applications for healthcare fall 2006

Classification: Neural Networks

“What factors determine a cell is cancerous?”

Color = dark

# nuclei = 1

# tails = 2

Healthy

Cancerous


  • Login