Loading in 2 Seconds...
Loading in 2 Seconds...
Data Warehousing/Mining Comp 150 DW Chapter 5: Concept Description: Characterization and Comparison. Instructor: Dan Hebert. Chapter 5: Concept Description: Characterization and Comparison. What is concept description? Data generalization and summarizationbased characterization
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Instructor: Dan Hebert
1
2
3
4
Conceptual levels
5
useBig_University_DB
mine characteristics as “Science_Students”
in relevance to name, gender, major, birth_place, birth_date, residence, phone#, gpa
fromstudent
where status in “graduate”
Select name, gender, major, birth_place, birth_date, residence, phone#, gpa
from student
where status in {“Msc”, “MBA”, “PhD” }
å
t_weight
count(qa)/
=
count(qi)
=
i
1
Presentation of Generalized Results(continued)sunny
rain
overcast
Wind
Humidity
yes
high
weak
normal
strong
no
yes
no
yes
TopDown Induction of Decision TreeAttributes = {Outlook, Temperature, Humidity, Wind}
PlayTennis = {yes, no}
Candidate relation for Target class: Graduate students (=120)
Candidate relation for Contrasting class: Undergraduate students (=130)
Number of grad students in “Science”
Number of undergrad students in “Science”
Example: Analytical characterization (3)0.9183
0.9892
0
0.9988
0.7873
Initial target class working relation W0: Graduate students
use Big_University_DB
mine comparison as “grad_vs_undergrad_students”
in relevance toname, gender, major, birth_place, birth_date, residence, phone#, gpa
for “graduate_students”
where status in “graduate”
versus “undergraduate_students”
where status in “undergraduate”
analyze count%
from student
Prime generalized relation for the target class: Graduate students
Prime generalized relation for the contrasting class: Undergraduate students
Read: if X satisfies condition, there is a probability (dweight) that x is in the target class
Ü
X
,
graduate
_
student
(
X
)
=
Ù
=

Ù
=
birth
_
country
(
X
)
"
Canada
"
age
_
range
(
X
)
"
25
30
"
gpa
(
X
)
"
good
"
[
d
:
30
%]
Example: Quantitative Discriminant RuleCount distribution between graduate and undergraduate students for a generalized tuple
Crosstab showing associated tweight, dweight values and total number (in thousands) of TVs and computers sold at AllElectronics in 1998
Minimum, Q1, M, Q3, Maximum