Data mining and its application and usage in medicine
Download
1 / 63

Data mining and its application and usage in medicine - PowerPoint PPT Presentation


  • 656 Views
  • Uploaded on

Data mining and its application and usage in medicine. By Radhika. Data Mining and Medicine. History Past 20 years with relational databases More dimensions to database queries earliest and most successful area of data mining Mid 1800s in London hit by infectious disease Two theories

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Data mining and its application and usage in medicine' - erika


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Data mining and medicine l.jpg
Data Mining and Medicine

  • History

    • Past 20 years with relational databases

      • More dimensions to database queries

    • earliest and most successful area of data mining

    • Mid 1800s in London hit by infectious disease

      • Two theories

        • Miasma theory  Bad air propagated disease

        • Germ theory  Water-borne

      • Advantages

        • Discover trends even when we don’t understand reasons

        • Discover irrelevant patterns that confuse than enlighten

        • Protection against unaided human inference of patterns provide quantifiable measures and aid human judgment

    • Data Mining

      • Patterns persistent and meaningful

      • Knowledge Discovery of Data


The future of data mining l.jpg
The future of data mining

  • 10 biggest killers in the US

  • Data mining = Process of discovery of interesting, meaningful and actionable patterns hidden in large amounts of data


Major issues in medical data mining l.jpg
Major Issues in Medical Data Mining

  • Heterogeneity of medical data

    • Volume and complexity

    • Physician’s interpretation

    • Poor mathematical categorization

    • Canonical Form

    • Solution: Standard vocabularies, interfaces between different sources of data integrations, design of electronic patient records

  • Ethical, Legal and Social Issues

    • Data Ownership

    • Lawsuits

    • Privacy and Security of Human Data

    • Expected benefits

    • Administrative Issues


Why data preprocessing l.jpg
Why Data Preprocessing?

  • Patient records consist of clinical, lab parameters, results of particular investigations, specific to tasks

    • Incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate data

    • Noisy: containing errors or outliers

    • Inconsistent: containing discrepancies in codes or names

    • Temporal chronic diseases parameters

  • No quality data, no quality mining results!

    • Data warehouse needs consistent integration of quality data

    • Medical Domain, to handle incomplete, inconsistent or noisy data, need people with domain knowledge


What is data mining the kdd process l.jpg
What is Data Mining? The KDD Process

Knowledge

Pattern Evaluation

Data Mining

Task-relevant

Data

Selection

Data

Warehouse

Data Cleaning

Data Integration

Databases


From tables and spreadsheets to data cubes l.jpg
From Tables and Spreadsheets to Data Cubes

  • A data warehouse is based on a multidimensional data model that views data in the form of a data cube

  • A data cube, such as sales, allows data to be modeled and viewed in multiple dimensions

    • Dimension tables, such as item (item_name, brand, type), or time(day, week, month, quarter, year)

    • Fact table contains measures (such as dollars_sold) and keys to each of related dimension tables

  • W. H. Inmon:“A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of management’s decision-making process.”


Data warehouse vs heterogeneous dbms l.jpg
Data Warehouse vs. Heterogeneous DBMS

  • Data warehouse: update-driven, high performance

    • Information from heterogeneous sources is integrated in advance and stored in warehouses for direct query and analysis

    • Do not contain most current information

    • Query processing does not interfere with processing at local sources

    • Store and integrate historical information

    • Support complex multidimensional queries


Data warehouse vs operational dbms l.jpg
Data Warehouse vs. Operational DBMS

  • OLTP (on-line transaction processing)

    • Major task of traditional relational DBMS

    • Day-to-day operations: purchasing, inventory, banking, manufacturing, payroll, registration, accounting, etc.

  • OLAP (on-line analytical processing)

    • Major task of data warehouse system

    • Data analysis and decision making

  • Distinct features (OLTP vs. OLAP):

    • User and system orientation: customer vs. market

    • Data contents: current, detailed vs. historical, consolidated

    • Database design: ER + application vs. star + subject

    • View: current, local vs. evolutionary, integrated

    • Access patterns: update vs. read-only but complex queries


Why separate data warehouse l.jpg
Why Separate Data Warehouse?

  • High performance for both systems

    • DBMS tuned for OLTP: access methods, indexing, concurrency control, recovery

    • Warehouse tuned for OLAP: complex OLAP queries, multidimensional view, consolidation

  • Different functions and different data:

    • Missing data: Decision support requires historical data which operational DBs do not typically maintain

    • Data consolidation: DS requires consolidation (aggregation, summarization) of data from heterogeneous sources

    • Data quality: different sources typically use inconsistent data representations, codes and formats which have to be reconciled


Typical olap operations l.jpg
Typical OLAP Operations

  • Roll up (drill-up): summarize data

    • by climbing up hierarchy or by dimension reduction

  • Drill down (roll down): reverse of roll-up

    • from higher level summary to lower level summary or detailed data, or introducing new dimensions

  • Slice and dice:

    • project and select

  • Pivot (rotate):

    • reorient the cube, visualization, 3D to series of 2D planes.

  • Other operations

    • drill across: involving (across) more than one fact table

    • drill through: through the bottom level of the cube to its back-end relational tables (using SQL)


Slide17 l.jpg

other

sources

Extract

Transform

Load

Refresh

Operational

DBs

Multi-Tiered Architecture

Monitor

&

Integrator

OLAP Server

Metadata

Analysis

Query

Reports

Data mining

Serve

Data

Warehouse

Data Marts

Data Sources

OLAP Engine

Front-End Tools

Data Storage


Steps of a kdd process l.jpg
Steps of a KDD Process

  • Learning the application domain:

    • relevant prior knowledge and goals of application

  • Creating a target data set: data selection

  • Data cleaning and preprocessing: (may take 60% of effort!)

  • Data reduction and transformation:

    • Find useful features, dimensionality/variable reduction, invariant representation.

  • Choosing functions of data mining

    • summarization, classification, regression, association, clustering.

  • Choosing the mining algorithm(s)

  • Data mining: search for patterns of interest

  • Pattern evaluation and knowledge presentation

    • visualization, transformation, removing redundant patterns, etc.

  • Use of discovered knowledge


Common techniques in data mining l.jpg
Common Techniques in Data Mining

  • Predictive Data Mining

    • Most important

    • Classification: Relate one set of variables in data to response variables

    • Regression: estimate some continuous value

  • Descriptive Data Mining

    • Clustering: Discovering groups of similar instances

    • Association rule extraction

      • Variables/Observations

    • Summarization of group descriptions


Leukemia l.jpg
Leukemia

  • Different types of cells look very similar

  • Given a number of samples (patients)

    • can we diagnose the disease accurately?

    • Predict the outcome of treatment?

    • Recommend best treatment based of previous treatments?

  • Solution: Data mining on micro-array data

  • 38 training patients, 34 testing patients ~ 7000 patient attributes

  • 2 classes: Acute Lymphoblastic Leukemia(ALL) vs Acute Myeloid Leukemia (AML)


Clustering instance based learning l.jpg
Clustering/Instance Based Learning

  • Uses specific instances to perform classification than general IF THEN rules

  • Nearest Neighbor classifier

  • Most studied algorithms for medical purposes

  • Clustering– Partitioning a data set into several groups (clusters) such that

    • Homogeneity: Objects belonging to the same cluster are similar to each other

    • Separation: Objects belonging to different clusters are dissimilar to each other. 

  • Three elements

    • The set of objects

    • The set of attributes

    • Distance measure


Measure the dissimilarity of objects l.jpg
Measure the Dissimilarity of Objects

  • Find best matching instance

  • Distance function

    • Measure the dissimilarity between a pair of data objects

  • Things to consider

    • Usually very different for interval-scaled, boolean, nominal, ordinal and ratio-scaled variables

    • Weights should be associated with different variables based on applications and data semantic

  • Quality of a clustering result depends on both the distance measure adopted and its implementation


Minkowski distance l.jpg
Minkowski Distance

  • Minkowski distance: a generalization

  • If q = 2, d is Euclidean distance

  • If q = 1, d is Manhattan distance

xi

Xi (1,7)

12

8.48

q=2

q=1

6

6

xj

Xj(7,1)


Binary variables l.jpg
Binary Variables

  • A contingency table for binary data

  • Simple matching coefficient

Object j

Object i


Dissimilarity between binary variables l.jpg
Dissimilarity between Binary Variables

  • Example

Object 2

Object 1


K nearest neighbors algorithm l.jpg
K-nearest neighbors algorithm

  • Initialization

    • Arbitrarily choose k objects as the initial cluster centers (centroids)

  • Iteration until no change

    • For each object Oi

      • Calculate the distances between Oi and the k centroids

      • (Re)assign Oi to the cluster whose centroid is the closest to Oi

    • Update the cluster centroids based on current assignment


K means clustering method l.jpg
k-Means Clustering Method

cluster

mean

current

clusters

objects

relocated

new

clusters


Dataset l.jpg
Dataset

  • Data set from UCI repository

  • http://kdd.ics.uci.edu/

  • 768 female Pima Indians evaluated for diabetes

  • After data cleaning 392 data entries


Hierarchical clustering l.jpg
Hierarchical Clustering

  • Groups observations based on dissimilarity

  • Compacts database into “labels” that represent the observations

  • Measure of similarity/Dissimilarity

    • Euclidean Distance

    • Manhattan Distance

  • Types of Clustering

    • Single Link

    • Average Link

    • Complete Link


Hierarchical clustering comparison l.jpg
Hierarchical Clustering: Comparison

5

1

5

5

4

1

3

1

4

1

2

2

5

2

5

5

2

1

5

2

5

2

2

2

3

3

6

6

3

6

3

1

6

3

3

1

4

4

4

1

3

4

4

4

Single-link

Complete-link

Average-link

Centroid distance


Compare dendrograms l.jpg
Compare Dendrograms

1 2 5 3 6 4

1 2 5 3 6 4

1 2 5 3 6 4

Single-link

Complete-link

Centroid distance

Average-link

2 5 3 6 4 1


Which distance measure is better l.jpg
Which Distance Measure is Better?

  • Each method has both advantages and disadvantages; application-dependent

  • Single-link

    • Can find irregular-shaped clusters

    • Sensitive to outliers

  • Complete-link, Average-link, and Centroid distance

    • Robust to outliers

    • Tend to break large clusters

    • Prefer spherical clusters


Dendrogram from dataset l.jpg
Dendrogram from dataset

  • Minimum spanning tree through the observations

  • Single observation that is last to join the cluster is patient whose blood pressure is at bottom quartile, skin thickness is at bottom quartile and BMI is in bottom half

  • Insulin was however largest and she is 59-year old diabetic


Dendrogram from dataset34 l.jpg
Dendrogram from dataset

  • Maximum dissimilarity between observations in one cluster when compared to another


Dendrogram from dataset35 l.jpg
Dendrogram from dataset

  • Average dissimilarity between observations in one cluster when compared to another


Supervised versus unsupervised learning l.jpg
Supervised versus Unsupervised Learning

  • Supervised learning (classification)

    • Supervision: Training data (observations, measurements, etc.) are accompanied by labels indicating the class of the observations

    • New data is classified based on training set

  • Unsupervised learning (clustering)

    • Class labels of training data are unknown

    • Given a set of measurements, observations, etc., need to establish existence of classes or clusters in data


Classification and prediction l.jpg
Classification and Prediction

  • Derive models that can use patient specific information, aid clinical decision making

  • Apriori decision on predictors and variables to predict

  • No method to find predictors that are not present in the data

  • Numeric Response

    • Least Squares Regression

  • Categorical Response

    • Classification trees

    • Neural Networks

    • Support Vector Machine

  • Decision models

    • Prognosis, Diagnosis and treatment planning

    • Embed in clinical information systems


Least squares regression l.jpg
Least Squares Regression

  • Find a linear function of predictor variables that minimize the sum of square difference with response

  • Supervised learning technique

  • Predict insulin in our dataset :glucose and BMI


Decision trees l.jpg
Decision Trees

  • Decision tree

    • Each internal node tests an attribute

    • Each branch corresponds to attribute value

    • Each leaf node assigns a classification

  • ID3 algorithm

    • Based on training objects with known class labels to classify testing objects

    • Rank attributes with information gain measure

    • Minimal height

      • least number of tests to classify an object

    • Used in commercial tools eg: Clementine

    • ASSISTANT

      • Deal with medical datasets

      • Incomplete data

      • Discretize continuous variables

      • Prune unreliable parts of tree

      • Classify data



Algorithm for decision tree induction l.jpg
Algorithm for Decision Tree Induction

  • Basic algorithm (a greedy algorithm)

    • Attributes are categorical (if continuous-valued, they are discretized in advance)

    • Tree is constructed in a top-down recursive divide-and-conquer manner

    • At start, all training examples are at the root

    • Test attributes are selected on basis of a heuristic or statistical measure (e.g., information gain)

    • Examples are partitioned recursively based on selected attributes



Construction of a decision tree for condition x l.jpg
Construction of A Decision Tree for “Condition X”

[P4,P5,P10]

Yes: 3, No:0

[P6,P14]

Yes: 0, No:2

YES

YES

YES

NO

NO

[P1,…P14]

Yes: 9, No:5

Age?

30…40

<=30

>40

[P1,P2,P8,P9,P11]

Yes: 2, No:3

[P3,P7,P12,P13]

Yes: 4, No:0

[P4,P5,P6,P10,P14]

Yes: 3, No:2

Vision

History

no

yes

excellent

fair

[P9,P11]

Yes: 2, No:0

[P1,P2,P8]

Yes: 0, No:3


Entropy and information gain l.jpg
Entropy and Information Gain

  • S contains si tuples of class Ci for i = {1, ..., m}

  • Information measures info required to classify any arbitrary tuple

  • Entropy of attribute A with values {a1,a2,…,av}

  • Information gained by branching on attribute A


Entropy and information gain45 l.jpg
Entropy and Information Gain

  • Select attribute with the highest information gain (or greatest entropy reduction)

    • Such attribute minimizes information needed to classify samples


Rule induction l.jpg
Rule Induction

  • IF conditions THEN Conclusion

  • Eg: CN2

    • Concept description:

      • Characterization: provides a concise and succinct summarization of given collection of data

      • Comparison: provides descriptions comparing two or more collections of data

  • Training set, testing set

  • Imprecise

  • Predictive Accuracy

    • P/P+N


Example used in a clinic l.jpg
Example used in a Clinic

  • Hip arthoplasty trauma surgeon predict patient’s long-term clinical status after surgery

  • Outcome evaluated during follow-ups for 2 years

  • 2 modeling techniques

    • Naïve Bayesian classifier

    • Decision trees

  • Bayesian classifier

    • P(outcome=good) = 0.55 (11/20 good)

    • Probability gets updated as more attributes are considered

    • P(timing=good|outcome=good) = 9/11 (0.846)

    • P(outcome = bad) = 9/20 P(timing=good|outcome=bad) = 5/9



Bayesian classification l.jpg
Bayesian Classification

  • Bayesian classifier vs. decision tree

    • Decision tree: predict the class label

    • Bayesian classifier: statistical classifier;predict class membership probabilities

  • Based on Bayes theorem; estimate posterior probability

  • Naïve Bayesian classifier:

    • Simple classifier that assumes attribute independence

    • High speed when applied to large databases

    • Comparable in performance to decision trees


Bayes theorem l.jpg
Bayes Theorem

  • Let X be a data sample whose class label is unknown

  • Let Hi be the hypothesis that X belongs to a particular class Ci

  • P(Hi) is class prior probability that X belongs to a particular class Ci

    • Can be estimated by ni/n from training data samples

    • n is the total number of training data samples

    • ni is the number of training data samples of class Ci

Formula of Bayes Theorem


More classification techniques l.jpg
More classification Techniques

  • Neural Networks

    • Similar to pattern recognition properties of biological systems

    • Most frequently used

      • Multi-layer perceptrons

        • Input with bias, connected by weights to hidden, output

      • Backpropagation neural networks

  • Support Vector Machines

    • Separate database to mutually exclusive regions

      • Transform to another problem space

      • Kernel functions (dot product)

      • Output of new points predicted by position

  • Comparison with classification trees

    • Not possible to know which features or combination of features most influence a prediction


Multilayer perceptrons l.jpg
Multilayer Perceptrons

  • Non-linear transfer functions to weighted sums of inputs

  • Werbos algorithm

    • Random weights

    • Training set, Testing set


Support vector machines l.jpg
Support Vector Machines

  • 3 steps

    • Support Vector creation

    • Maximal distance between points found

    • Perpendicular decision boundary

  • Allows some points to be misclassified

  • Pima Indian data with X1(glucose) X2(BMI)


What is association rule mining l.jpg
What is Association Rule Mining?

  • Finding frequent patterns, associations, correlations, or causal structures among sets of items or objects in transaction databases, relational databases, and other information repositories

Example of Association Rules

{High LDL, Low HDL}  {Heart Failure}

  • People who have high LDL (“bad” cholesterol), low HDL (“good cholesterol”) are at

    higher risk of heart failure.


Association rule mining l.jpg
Association Rule Mining

  • Market Basket Analysis

    • Same groups of items bought placed together

    • Healthcare

      • Understanding among association among patients with demands for similar treatments and services

    • Goal : find items for which joint probability of occurrence is high

    • Basket of binary valued variables

    • Results form association rules, augmented with support and confidence


Association rule mining56 l.jpg
Association Rule Mining

Trans containing both X and Y

D

Trans containing X

Trans containing Y

  • Association Rule

    • An implication expression of the form X  Y, where X and Y are itemsets and XY=

  • Rule Evaluation Metrics

    • Support (s): Fraction of transactions that contain both X and Y

    • Confidence (c): Measures how often items in Y appear in transactions thatcontain X


The apriori algorithm l.jpg
The Apriori Algorithm

  • Starts with most frequent 1-itemset

  • Include only those “items” that pass threshold

  • Use 1-itemset to generate 2-itemsets

  • Stop when threshold not satisfied by any itemset

  • L1 = {frequent items};

    for (k = 1; Lk !=; k++) do

    • Candidate Generation: Ck+1 = candidates generated from Lk;

    • Candidate Counting: for each transaction t in database do increment the count of all candidates in Ck+1 that are contained in t

    • Lk+1 = candidates in Ck+1 with min_sup

      return k Lk;


Apriori based mining l.jpg
Apriori-based Mining

Data base D

1-candidates

Freq 1-itemsets

2-candidates

TID

Items

Itemset

Sup

Itemset

Sup

Itemset

10

a, c, d

a

2

a

2

ab

Scan D

20

b, c, e

b

3

b

3

ac

30

a, b, c, e

c

3

c

3

ae

40

b, e

d

1

e

3

bc

Min_sup=0.5

e

3

be

ce

Counting

3-candidates

Freq 2-itemsets

Scan D

Itemset

Sup

Itemset

Itemset

Sup

ab

1

bce

ac

2

Scan D

ac

2

bc

2

Freq 3-itemsets

ae

1

be

3

bc

2

ce

2

Itemset

Sup

be

3

bce

2

ce

2


Principle component analysis l.jpg
Principle Component Analysis

  • Principle Components

    • In cases of large number of variables, highly possible that some subsets of the variables are very correlated with each other. Reduce variables but retain variability in dataset

    • Linear combinations of variables in the database

      • Variance of each PC maximized

        • Display as much spread of the original data

      • PC orthogonal with each other

        • Minimize the overlap in the variables

      • Each component normalized sum of square is unity

        • Easier for mathematical analysis

    • Number of PC < Number of variables

      • Associations found

      • Small number of PC explain large amount of variance

    • Example 768 female Pima Indians evaluated for diabetes

      • Number of times pregnant, two-hour oral glucose tolerance test (OGTT) plasma glucose, Diastolic blood pressure, Triceps skin fold thickness, Two-hour serum insulin, BMI, Diabetes pedigree function, Age, Diabetes onset within last 5 years



National cancer institute l.jpg
National Cancer Institute

  • CancerNet http://www.nci.nih.gov

  • CancerNet for Patients and the Public

  • CancerNet for Health Professionals

  • CancerNet for Basic Reasearchers

  • CancerLit


Conclusion l.jpg
Conclusion

  • About ¾ billion of people’s medical records are electronically available

  • Data mining in medicine distinct from other fields due to nature of data: heterogeneous, with ethical, legal and social constraints

  • Most commonly used technique is classification and prediction with different techniques applied for different cases

  • Associative rules describe the data in the database

  • Medical data mining can be the most rewarding despite the difficulty



ad