Business Data Mining. Dr. Yukun Bao School of Management, HUST. CAN YOU BE A DOCTOR WITHOUT MEDICAL TRAINING?. Production Rules. IF Swollen Glands = Yes THEN Diagnosis = Strep Throat IF Swollen Glands = No & Fever = Yes THEN Diagnosis = Cold IF Swollen Glands = No & Fever = No
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Business Data Mining
Dr. Yukun Bao
School of Management, HUST
Data Mining: Concepts and Techniques
Production Rules
Data Mining: Concepts and Techniques
What is classification? What is prediction?
Issues regarding classification and prediction
Classification by decision tree induction
Bayesian classification
Classification by back propagation
Prediction
Accuracy and error measures
Model selection
Summary
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Training
Data
Classifier
(Model)
Classification
Algorithms
IF rank = ‘professor’
OR years > 6
THEN tenured = ‘yes’
Data Mining: Concepts and Techniques
Classifier
Testing
Data
Unseen Data
(Jeff, Professor, 4)
Tenured?
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
What is classification? What is prediction?
Issues regarding classification and prediction
Classification by decision tree induction
Bayesian classification
Classification by back propagation
Prediction
Accuracy and error measures
Model selection
Summary
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
What is classification? What is prediction?
Issues regarding classification and prediction
Classification by decision tree induction
Bayesian classification
Classification by back propagation
Prediction
Accuracy and error measures
Model selection
Summary
Data Mining: Concepts and Techniques
This follows an example of Quinlan’s ID3 (Playing Tennis)
Data Mining: Concepts and Techniques
age?
<=30
overcast
>40
31..40
student?
credit rating?
yes
excellent
fair
no
yes
no
yes
no
yes
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Attribute Selection Measure: Information Gain (ID3/C4.5)
Data Mining: Concepts and Techniques
Class P: buys_computer = “yes”
Class N: buys_computer = “no”
means “age <=30” has 5 out of 14 samples, with 2 yes’es and 3 no’s. Hence
Similarly,
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
where pj is the relative frequency of class j in D
Data Mining: Concepts and Techniques
What is classification? What is prediction?
Issues regarding classification and prediction
Classification by decision tree induction
Bayesian classification
Classification by back propagation
Prediction
Accuracy and error measures
Model selection
Summary
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
posteriori = likelihood x prior/evidence
Data Mining: Concepts and Techniques
needs to be maximized
Data Mining: Concepts and Techniques
and P(xkCi) is
Data Mining: Concepts and Techniques
Class:
C1:buys_computer = ‘yes’
C2:buys_computer = ‘no’
Data sample
X = (age <=30,
Income = medium,
Student = yes
Credit_rating = Fair)
Data Mining: Concepts and Techniques
P(buys_computer = “no”) = 5/14= 0.357
P(age = “<=30”  buys_computer = “yes”) = 2/9 = 0.222
P(age = “<= 30”  buys_computer = “no”) = 3/5 = 0.6
P(income = “medium”  buys_computer = “yes”) = 4/9 = 0.444
P(income = “medium”  buys_computer = “no”) = 2/5 = 0.4
P(student = “yes”  buys_computer = “yes) = 6/9 = 0.667
P(student = “yes”  buys_computer = “no”) = 1/5 = 0.2
P(credit_rating = “fair”  buys_computer = “yes”) = 6/9 = 0.667
P(credit_rating = “fair”  buys_computer = “no”) = 2/5 = 0.4
P(XCi) : P(Xbuys_computer = “yes”) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044
P(Xbuys_computer = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019
P(XCi)*P(Ci) : P(Xbuys_computer = “yes”) * P(buys_computer = “yes”) = 0.028
P(Xbuys_computer = “no”) * P(buys_computer = “no”) = 0.007
Therefore, X belongs to class (“buys_computer = yes”)
Data Mining: Concepts and Techniques
Prob(income = low) = 1/1003
Prob(income = medium) = 991/1003
Prob(income = high) = 11/1003
Data Mining: Concepts and Techniques
Symptoms: fever, cough etc., Disease: lung cancer, diabetes, etc.
Data Mining: Concepts and Techniques
What is classification? What is prediction?
Issues regarding classification and prediction
Classification by decision tree induction
Bayesian classification
Classification by back propagation
Prediction
Accuracy and error measures
Model selection
Summary
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques

mk
x0
w0
x1
w1
f
å
output y
xn
wn
Input
vector x
weight
vector w
weighted
sum
Activation
function
Data Mining: Concepts and Techniques
Output vector
Output layer
Hidden layer
wij
Input layer
Input vector: X
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
What is classification? What is prediction?
Issues regarding classification and prediction
Classification by decision tree induction
Bayesian classification
Rulebased classification
Classification by back propagation
Prediction
Accuracy and error measures
Model selection
Summary
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
y = w0 + w1 x
where w0 (yintercept) and w1 (slope) are regression coefficients
Data Mining: Concepts and Techniques
y = w0 + w1 x + w2 x2 + w3 x3
convertible to linear with new variables: x2 = x2, x3= x3
y = w0 + w1 x + w2 x2 + w3 x3
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
What is classification? What is prediction?
Issues regarding classification and prediction
Classification by decision tree induction
Bayesian classification
Classification by back propagation
Prediction
Accuracy and error measures
Model selection
Summary
Data Mining: Concepts and Techniques
sensitivity = tpos/pos /* true positive recognition rate */
specificity = tneg/neg /* true negative recognition rate */
precision = tpos/(tpos + fpos)
accuracy = sensitivity * pos/(pos + neg) + specificity * neg/(pos + neg)
Data Mining: Concepts and Techniques
The mean squarederror exaggerates the presence of outliers
Popularly use (square) root meansquare error, similarly, root relative squared error
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
What is classification? What is prediction?
Issues regarding classification and prediction
Classification by decision tree induction
Bayesian classification
Rulebased classification
Classification by back propagation
Prediction
Model selection
Summary
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
What is classification? What is prediction?
Issues regarding classification and prediction
Classification by decision tree induction
Bayesian classification
Classification by back propagation
Prediction
Model selection
Summary
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques