Using type inference and induced rules to provide intensional answers
This presentation is the property of its rightful owner.
Sponsored Links
1 / 265

Using Type Inference and Induced Rules to Provide Intensional Answers PowerPoint PPT Presentation


  • 113 Views
  • Uploaded on
  • Presentation posted in: General

Using Type Inference and Induced Rules to Provide Intensional Answers. Wesley W. Chu Rei-Chi Lee Qiming Chen. What is Intensional Answer?. Intensional answer to a query provides the characteristics that describes the database values (the extensional answers ) that satisfy the query

Download Presentation

Using Type Inference and Induced Rules to Provide Intensional Answers

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Using type inference and induced rules to provide intensional answers

Using Type Inference and Induced Rules to Provide Intensional Answers

Wesley W. Chu

Rei-Chi Lee

Qiming Chen


What is intensional answer

What is Intensional Answer?

  • Intensional answer to a query provides the characteristics that describes the database values (the extensional answers) that satisfy the query

  • Intensional answers provide the users with:

    • Summarized or approximate descriptions about the extensional answers

    • Additional insight into the nature of extensional answers


An example of intensional answer

An Example of Intensional Answer

Consider a personnel database containing the relation:

EMPLOYEE = (ID, Name, Position, Salary)

To find the person whose annual salary is more than 100K, the query can be specified as

Q = SELECT * FROM EMPLOYEE WHERE Salary >100K;

A traditional answer would be:

{“Smith”, “Jones”,...}

An intensional answer would be:

“All the managers.”


Using type inference and induced rules to provide intensional answers

  • Prior Work:

    • Constraint-based approach for intensional query answering (Motro 89)

    • Aggregate response using type hierarchy (Shum 88)

      Only limited form of intentional answers can be generated

  • New Approach:

    Use both type hierarchy and database intensional knowledge

  • Two Phases:

    • Knowledge Acquisition

      • Use rule induction to derive intensional knowledge from database content

    • Type Inference

      • Based on type hierarch, use the derived rules to generate the specific intensional answers


Traditional views of type hierarchy

Traditional Views of Type Hierarchy

  • In semantic or object-oriented data modeling, there are two traditional views of type hierarchy:

    1. IS_A Hierarchy:

    A IS_A B means every member of type A is also a member of

    type B.

    2. PART_OF Hierarchy:

    A is PART_OF B means A is a component of B.

    These two views are mainly used for data modeling which provides a language for:

    • describing and storing data

    • accessing and manipulating the data


The notion of type hierarchy

The Notion of Type Hierarchy

Classes and Types:

  • Any of the entities being modeled that share some common characteristics are gathered into classes.

  • All elements of the class have the same class type.

    Type Hierarchy is a partial order for the set of types:

  • Types (referred to as super-types) at higher positions are more generalized than types at lower positions.

  • Types (referred to as sub-types) at lower positions are more specialized than types at higher positions.


An is a type hierarchy example

An IS_A Type Hierarchy Example


Type inference

Type Inference

  • Type Inference is a process if traversing the type hierarchy that is based on the query condition and the induced rules.

  • Traversing of the type hierarchy can be performed in two directions:

    • Forward Inference

    • Backward Inference


Deriving intensional answers using forward inference

Deriving Intensional Answers Using Forward Inference

Forward Inference uses the known facts to derive more facts. That is, given a rule, “If X then Y” and a fact “X is true”, then we can conclude “Y is true”.

We perform forward inference by traversing along the type hierarchies downward from the type that is involved in the query. As a result,

  • The search scope for answering the query can be reduced

  • The lowest (the most specific) type description satisfying the query condition are returned as the intensional answers


A forward inference example

id

SSBN730

SSBN130

name

Rhode Island

Typhoon

class

0101

1301

type

SSBN

SSBN

A Forward Inference Example

To find the submarine whose displacement is greater than 8000, the query can be specified as:

Q = SELECT * FROM SUBMARINE WHERE Displacement > 8000.

The extensional answer to the query is:

Using forward inference with R4, we can derive the following intensional answer:

“Ships are SSBN submarines.”


Derive intensional answers using backward inference

Derive Intensional Answers Using Backward Inference

  • Backward Inference uses the known facts to infer what must be true according to the type hierarchies and induced rules

  • Using backward inference, we traverse upward along the type hierarchies to provide the set of with constraints as intensional answers.


A backward inference example

name

Nathaniel Hale

Daniel Boone

Sam Rayburn

Lewis and Clark

Mariano G. Vallejo

Rhode Island

Typhoon

class

0103

0103

0103

0102

0102

0101

1301

A Backward Inference Example

To find the names and classes of the SSBN submarines, the query can be specified as:

Q = SELECT Name, Class

FROM SUBMARINE, CLASS

WHERE Type = “SSBN”;

The extensional answer to the query is:

Using backward inference, we can derive the following intensional answer:

“Some ships have classes in the range of 0101 to 0103.”


Deriving intensional answers via type inference

Deriving Intensional Answers via Type Inference

Using forward inference, the intensional answer gives a set of type descriptions that includes the answers.

Using backward inference, the intensional answer gives only a description of partial answers.

Therefore,

  • the intensional answers derived from forward inference characterize a set of instances containing the extensional answers, whereas

  • the intensional answers derived from backward inference characterize a set of answers contained in the extensional answers.

    Forward inference and backward inference can also be combined to derive more specific intensional answers.


A forward and backward inference example

name

Bonefish

Seadragon

Snook

Robert E. Lee

class

0215

0212

0209

0208

type

SSN

SSN

SSN

SSN

A Forward and Backward Inference Example

To find the names, classes, and types of the SUBMARINES equipped with sonar BQS-04, the query can be specified as:

Q =SELECT SUBMARINE.Name, SUBMARINE.Class, CLASS.Type

FROM SUBMARINE, CLASS, INSTALL

WHERE SUBMARINE.Class = CLASS.Class

AND SUBMARINE.Id = INSTALL.Ship

AND INSTALL.Sonar = “BQS-04”

The extensional answer to the query is:

Using both forward inference and backward inference, we can derive the following intensional answer:

“Ship type SSN with class in the range of 0208 to 0215 is equipped with sonar BQS-04.”


Conclusions

Conclusions

In this research, we have proposed an approach to provide intensional answers using type inference and induced rules:

  • Type Inference

    • Inference

    • Backward inference

    • Combine forward and backward inference

    • Type inference with multiple type hierarchies

  • Rule Induction

    • Model-based inductive learning technique derives rules form database contents.

      For databases with strong type hierarchy and semantic knowledge, type inference is more effective than integrity constraints to derive intensional answers


Fault tolerant ddbms via data inference

Fault Tolerant DDBMS Via Data Inference

  • Network Partition

    • Causes:

      • Failures of:

        • Channels

        • Nodes

    • Effects:

      • Queries cannot be processed if the required data is inaccessible

      • Replicated files in different partitions may be inconsistent

      • Updates may only be allowed in one partition.

      • Transactions may be aborted


Conventional approach for handling network partitioning

Conventional Approach for Handling Network Partitioning

  • Based on syntax to serialize the operations

  • To assure data consistency

    • Not all queries can be processed

    • Based on data availability, determine which partition is allowed to perform database update

      POOR AVAILABILITY!!


New approach

New Approach

Exploit data and transaction semantics

  • Use Data Inference Approach

    • Assumption: Data are correlated

      • Examples

        • Salary and rank

        • Ship type and weapon

    • Infer in accessible data from the accessible data

  • Use semantic information to permit update under network partitioning


Query processing system with data inference

Query Processing System with Data Inference

  • Consists of

    • DDBMS

    • Knowledge-Base (rule-based)

    • Inference Engine


Ddbms with data inference

Information Module

Query Input

Database Fragments

Allocation

Availability

Query Parser

and

Analyzer

Inference System

Inference Engine

Rule Based Knowledge-Based System

DDBMS

Query Output

DDBMS with Data Inference


Fault tolerant ddbms with inference systems

Fault Tolerant DDBMS with Inference Systems

KB2

DB2

SF

IE

LA

NY

KB1

KB3

DB1

DB3

IE

IE

KB

SHIP(SID) INSTALL (TYPE)

INSTALL(TYPE) INSTALL(WEAPON)


Architecture of distributed database with inference

Architecture of Distributed Database with Inference


Motivation of open data inference

Motivation of Open Data Inference

  • Correlated knowledge is incomplete

    • Incomplete rules

    • Incomplete objects


Example of incomplete object

Example of Incomplete Object

Type -------> Weapon

IF type in {CG, CGN} THEN weapon = SAM01

IF type = DDG THEN weapon = SAM02

TYPEWEAPON

CG SAM01

CGN SAM01

DDG SAM02

SSGN ??

Result: Incomplete rules generate incomplete object.


Merge of incomplete objects

Merge of Incomplete Objects

  • Observation:

    • Relational join is not adequate for combining incomplete objects

      • Lose information

  • Questions:

    • What kind of algebraic tools do we need to combine incomplete objects without losing information?

    • Any correctness criteria to evaluate the incomplete results?


Merge of incomplete objects1

Merge of Incomplete Objects

TYPE ---> WEAPON and WEAPON --->WARFARE

Type WeaponWeapon Warfare

CG SAM01SAM01 WF1C

CGN SAM01SAM03 WF1D

DDG SAM02

SSGN ?

Use relational join to combine the above two paths:

TypeWeapon Warfare

CG SAM01 WF1C

CGN SAM01 WFIC

Other way to combine:

TYPE WEAPON WARFARE

CG SAM01 WF1C

CGN SAM01 WF1C

DDG SAM02 ?

? SAM03 WF1D

SSGN ? ?


New algebraic tools for incomplete objects

New Algebraic Tools for Incomplete Objects

  • S-REDUCTION

    • Reduce redundant tuples in the object

  • OPEN S-UNION

    • Combine incomplete objects


S reduction

S-Reduction

  • Remove redundant tuples in the object

  • Object RR with key attribute A is reduced to R

    RRR

    ABCABC

    a1aaa1aa

    b2_b2bb

    c_ccc_cc

    a1aa

    b_bb

    c__


Open s union

Open S-Union

  • Modify join operation to accommodate incomplete information

  • Used to combine closed/open objects

    R1 R2 ----> R

    sidtypetypeweapon sid typeweapon

    s101 DD DDSAM01s101 DDSAM01

    s102 DD CG -s102 DDSAM01

    s103 CGs103 CG -


Open s union and toleration

Open S-Union and Toleration

  • Performing open union on two objects R1, R2 generates the third object which tolerates both R1 and R2.

    R1 U R2 ----> R

    sidtypetypeweapon sidtypeweapon

    s101 DD DDSAM01s101 DDSAM01

    s102 DD CG -s102 DDSAM01

    s103 CGs103 CG -

  • R tolerates R1

  • R tolerates R2


Example of open inference

site LA: SHIP(sid,sname,class)

site SF: INSTALL(sid,weapon)

site NY: CLASS(class,type,tname)

Query: Find the ship names that carry weapon ‘SAM01’

(assuming site SF is partitioned)

SF

INSTALL

Network Partition

LA

SHIP

NY

CLASS

Rule: If SHIP TYPE = DD, Then WEAPON = SAM01

Example of Open Inference


Implementation

Implementation

Derive missing relations from accessible relations and correlated knowledge

Three types of derivations:

  • View mechanism to derive new relations based on certain source relations

  • Valuations of incomplete relations based on correlated knowledge

  • Combine two intermediate results via open s-union operation


Example of open inference1

Example of Open Inference

DERIVATION 1: select sid, type from SHIP, CLASS

DERIVATION 2: CLASS(type) --> INSTALL(weapon)

R1 R2 ----> INSTALL_INF

sid typetypeweapon sidtypeweapon

s101DD DDSAM01s101 DDSAM01

s102DD CG -s102 DDSAM01

s103CGs103 CG -

  • INSTALL_INF can be used to replace missing relation INSTALL


Fault tolerant ddbms via inference techniques

Fault Tolerant DDBMS via Inference Techniques

  • Query Processing Under Network Partitioning

    • Open Inference: Inference with incomplete information

    • Algebraic tools for manipulating incomplete objects

    • Toleration: weaker correctness criteria for evaluating incomplete information


Conclusion

Conclusion

Data Inference is an effective method for providing database fault tolerance during network partitioning.


Intelligent dictionary and directory idd

Intelligent Dictionary and Directory (IDD)

  • The role of the IDD and the emerging technology of Object-Oriented Database Systems

  • The integration of Artificial Intelligence and Database Management tools and techniques to explore new architectures for the IDD

  • The support of future applications: heterogeneous, distributed, cooperating data/knowledge systems guided by active, intelligent dictionaries and directories and managed by Data and Knowledge Administrators


Using type inference and induced rules to provide intensional answers

  • Object-Oriented Dictionary Modeling

  • Information Resource Dictionary System

  • Functional Specification of the IDD

  • Role of Machine Learning in IDD

    • Mining Knowledge from Data

    • Schema Evolution

  • System Optimization Issues

  • Support for Hypermedia


  • The knowledge data model kdm

    The Knowledge/Data Model (KDM)

    • The KDM modeling primitives are:

      Generalization: Generalization provides the facility in the KDM to group similar objects into a more general object. This generalization hierarchy defines the inheritance structure.

      Classification: Classification provides a means whereby specific object instances can be considered as a higher-level object-type (an object-type is a collection of similar objects). This is done through the use of the “is-instance-of” relationship.

      Aggregation: Aggregation is an abstraction mechanism in which an object is related to its components via the “is-part-of” relationship.

      Membership: Membership is an abstraction mechanism that specifically supports the “is-a-member-of” relationship.


    The knowledge data model

    The Knowledge/Data Model

    • Temporal: Temporal relationship primitives relate object-types by means of synchronous and asynchronous relationships.

    • Constraints: This primitive is used to place a constraint on some aspect of an object, operation, or relationship via the “is-constraint-on” relationship.

    • Heuristic: A heuristic can be associated with an object via the “is-heuristic-on” relationship. These are used to allow the specifications of rules and knowledge to be associated with an object. In this way, object properties can be inferred using appropriate heuristics.


    The kdl template for object type specification

    The KDL Template for Object-Type Specification

    object-type: OBJECT-TYPE-NAME has

    [attributes:

    {ATTRIBUTE-NAME:

    [set of/list of] VALUE-TYPE

    /*default is single-valued

    composed of [ATTRIBUTE-NAME,}]

    with constraints {predicate,}]

    with heuristics {RULE,}];}]

    [subtypes:

    {OBJECT-TYPE-NAME,}]

    [supertypes:

    {OBJECT-TYPE-NAME,}]

    [constraints:

    {predicate,}]

    [heuristics:

    {rule,}]


    The kdl template for object type specification cont d

    The KDL Template for Object-Type Specification (Cont’d)

    /*successors predecessors, and concurrents are temporal primitives

    [successors:

    {OBJECT-TYPE-NAME,}]

    [predecessors:

    {OBJECT-TYPE-NAME,}]

    [concurrents:

    {OBJECT-TYPE-NAME,}]

    [members:

    {MEMBER-NAME: MEMBER-TYPE}]

    [instances:

    {INSTANCE,}]

    end-object-type


    Three services database schemata

    Three Services Database Schemata


    Knowledge source schemata in the kdm paradigm

    Knowledge Source Schemata in the KDM Paradigm


    Kdl object type specification template

    KDL Object Type Specification Template


    The thesaurus object meta schema

    The THESAURUS_OBJECT Meta-Schema


    The thesaurus object meta object type specification

    The THESAURUS_OBJECT Meta-Object-Type Specification


    The knowledge source object meta object type specification

    The KNOWLEDGE_SOURCE_OBJECT Meta-Object-Type Specification


    Three level specification local fm and federation

    Three-Level Specification – Local, FM, and Federation


    A sample export data knowledge task schema for a federation interface manager

    A sample export data/knowledge/task schema for a federation interface manager


    Conclusions1

    Conclusions

    • IDD based on the Knowledge Data Model can provide the modeling power needed to:

      • Extend the notions of the Information Resource Dictionary System

      • Support Object-Oriented DBMS

      • Act as an Intelligent Thesaurus to support Cooperating Knowledge Sources for Heterogeneous Databases

    • Scheme Evolution will require a meta-level characterization of the KDM constructs so that inference tools can reason about the effects of changes to schema.


    Intelligent heterogeneous autonomous database architecture inhead

    Intelligent Heterogeneous Autonomous Database Architecture (INHEAD)

    Reference:

    D. Weishar and L. Kershberg, “An Intelligent Heterogeneous Autonomous Database Architecture for Semantic Heterogeneity Support”, Proceedings of the First International Workshop on Interoperability in Multi-Database Systems. Kyoto, Japan, pp. 152-155, 1991.


    Inhead

    INHEAD

    • Place query on blackboard

    • KS (domain experts) of the DBMS

    • KS cooperatively tries to find a solution to the query

    • If no request is found, further clarifications and request information needed by the users.

      Thesaurus performs semantic query processing of users original query,

      Controller provides:

      • necessary query translation and optimization

      • integrates the results


    Idd for an intelligent front end to heterogeneous databases

    IDD for an Intelligent Front End to Heterogeneous Databases


    Blackboard

    BLACKBOARD

    • Dynamic Control - make inferences related to solution formation at each step

    • Focus of Attention - determine what part of the emerging solution should be attended to next

    • Flexibility of Programming the Control - knowledge about how control should be applied in various domains can be codified in control rules or in complex control regimes


    Using type inference and induced rules to provide intensional answers

    • Modularity

      • Well suited to the class of problems possessing one or more of the following characteristics:

        • The need to represent many specialized and distinct kinds of knowledge

        • The need to integrate disparate information

        • A natural domain hierarchy

        • Having continuous data input (e.g., Signal tracking)

        • Having sparse knowledge/data

    • Supporting semantic heterogeneity in a system of heterogeneous autonomous database exhibits many of these characteristics.


    Opportunistic query processing

    Opportunistic Query Processing

    • Opportunistic - query can be processed based on goal, sub-goal, and hypothesis changes.

      • Redundant and overlapping data provides parallel processing

    • Incrementally Query Processing - can halt the processing of the query when the control structure determines that the query has been satisfied.


    The active and intelligent thesaurus

    The Active and Intelligent Thesaurus

    • Validating and performing consistency check on the input to the thesaurus itself

    • Indexing and converting data values

    • Translating queries using different variants of names

    • Actively participating in on-line HELP (i.e., offer suggestions)

    • The thesaurus can be used as:

      • A repository of knowledge of data item

      • An incorporation of newly discovered knowledge

      • An integration with existing knowledge


    Data knowledge packets

    Data/Knowledge Packets

    • Object Encapsulation

      • Encapsulating

        • Object structure

        • Relationships

        • Operations

        • Constraints

        • Rules

    • Data/Knowledge Packet allows the specification of abstract object types at the global level and the encapsulation of optional and structural semantics.


    An example the artillery movement problem

    An Example: The Artillery Movement Problem

    Goal: provision 10 M110 Howitzer Weapon System for departure to Middle East in 5 days.

    • Characteristics DB: describes the physical characteristics of the component parts of the weapons system

    • Weapon system DB: describes the components of weapons system

    • Logistics database: describes the logistics support required to sustain weapons systems in combat

    • Personnel DB for crew requisitioning

    • Ship DB for obtaining space on seagoing vessels.


    Using type inference and induced rules to provide intensional answers

    • OverallGoal: Provision 10M110 Howitzer Weapon System for departure to the Middle East in 5 days.

    • Subgoals

      1.0Determine availability of 10 M110 Howitzer Weapon Systems

      1.1Determine the locations of such items, subject to constraints of being within 500 miles of Norfolk, Virginia

      1.2Send requests for items to locations to hold for shipment

      2.0Determine Availability of Logistic Support Units

      2.1Specialize camouflage to desert conditions

      2.2Specialize radar to desert night vision

      2.3Specialize rations to high water content rations

      2.4Specialize clothing to lightweight, chemically resistant


    Using type inference and induced rules to provide intensional answers

    3.0Determine Availability of Sealift Capability along the Eastern Seaboard

    3.1Calculate total weight and volume for each system

    3.2Provision crews for each system

    3.3Assign crews and weapons to ships

    3.3.1Notify Crews

    3.3.2Send shipment requisitions to sites holding weapons systems


    Uncertainty management using rough sets

    Uncertainty Management Using Rough Sets


    Why deal with uncertainty

    Why Deal with Uncertainty

    • Most tasks requiring intelligent behavior have some uncertainty

      Forms of uncertainty in KB systems

    • Uncertainty in the data

      • Missing data

      • Imprecise representation, etc.

    • Uncertainty in the knowledge-base

      • Best guesses

      • Not applicable in all domains


    Why deal with uncertainty cont d

    Why Deal with Uncertainty (cont’d)

    Some approaches to handle uncertainty:

    • Probability and Bayesian statistics

    • Confidence (or certainty) factors

    • Dempster Shafer theory of evidence

    • Fuzzy sets and fuzzy logic

      Problems with these approaches

    • Make strong statistical assumptions such as follow a probability distribution model

      • E.g., Bayesian approach

    • Cannot recognize structural properties of data qualitatively

      • Represent through numbers

      • E.g., Fuzzy Logic – concept of “Tall”, “Very Tall”, etc.


    Rough sets

    Rough Sets

    • Good for reasoning from qualitative and imprecise data

      • No approximation by numbers

      • No probability distribution model required

    • Uses set theory to provide insight into the structural properties of data

    • Theory developed by Z. Pawlak in 1982

    • Well known experimental applications in

      • Medical diagnosis (Pawlak, Slowinski & Slowinski, 1986)

      • Machine learning (Wong & Ziarko, 1986b)

      • Information Retrieval (Gupta, 1988)

      • Conceptual Engineering design (Arciszewski and Ziarko, 1986)

      • Approximate Reasoning (Rasiowa and Epstein, 1987)

    • BASIC IDEA

      • Lower the degree of precision in the representation of objects

      • Make data regularities more visible and easier to characterize in terms of rules


    Example 1

    Example 1


    Rough sets vs classical sets

    Rough Sets vs. Classical Sets

    • Classical sets have well defined boundaries since the data representation is exact

    • Rough sets have fuzzy boundaries since knowledge is insufficient to determine exact membership of an object in the set

      • Example:

        • U: Universal set of all cars

        • X: Set of all fuel efficient cars

    • In the rough set approach, fuel efficiency is indirectly determined from attributes such as:

      • Weight of car

      • Size of engine

      • Number of cylinders, etc.

        Attribute Dependency

    • Qualitatively determine the significance of one or more attributes (such as Weight, Size) on a decision attribute (such as Fuel eff.)


    Indiscernibility relation ind equivalence class

    Indiscernibility Relation (IND) Equivalence Class


    Definitions

    Definitions


    Definition cont d

    Definition (cont’d)

    • Boundary region BND(X)

      • Consists of objects whose membership cannot be determined exactly.

      • BND(X) = IND(X) – IND(X)

    • Negative region NEG(X)

      • Union of those elementary sets of IND that are entirely outside X.

      • NEG(X) = U – IND(X)

    • Accuracy measure AM(X)

      • If lower approximation is different from upper approximation, the set is rough.

      • AM(X) = Card(IND(X)) / Card(IND(X))


    Example 2

    Example 2


    Using type inference and induced rules to provide intensional answers

    What is the accuracy measure when the set of melons is classified on the attribute “size”?

    Let c be the condition attribute ‘size’

    Let x be the ‘set of all melons’

    x= {p1, p2, p4, p5}

    The elementary classes of the attribute ‘size’ are as follows:

    x1 = {p3,p6, p7,p9}size = small

    x2 = {p1, p5}size = med

    x3 = {p2, p4, p8}size = large

    IND(x, c) = {p1, p5}

    IND(x, c) = {p1, p2, p4, p5, p8}

    BND(x, c) = {p2, p4, p8}

    NEG(x c) = {p3, p6, p7, p9}

    Accuracy measure, AM(x, c) = 2/5


    Attribute dependency

    Attribute Dependency


    Example 3

    Example 3

    How useful is {shape, taste} in determining the {kind of products}?

    C: {shape, taste}

    D: {kind of product}

    D’= the elementary classes of IND(D)

    = {{set of all melons}, {set of all other fruits}}

    = {{p, p2, p4, p5}, {p3, p6, p7, p8, p9}}

    Elementary classes of IND(C)

    = {

    {p1},For (syp, sweet)

    {p2, p4, p5, p6},For (cyl, sweet)

    {p3, p8},For (syp, normal)

    {p9},For (cyl, normal)

    {p7},For (sph, sour)

    { },For (cyl, sour)

    }


    Using type inference and induced rules to provide intensional answers

    POS (C, D) = the union of all positive regions

    = {

    p1 (contained in class Melon),

    p3, p8 (contained in class Other),

    p9, (contained in class Other),

    p7, (contained in class Other)

    }

    Dependency = card (POS(C, D))/card (U) = 5/9

    Since 0 < 5/9 < 1, we have a partial dependency of D = {kind of product} on C = {shape, taste}


    Interpretation of k c d

    Interpretation of K(C,D)

    • IF K(C,D) = 1, we have full dependency

      • Any class of object in D can be completely determined by the attributes in C.

    • IF 0 < K(C,D) < 1, we have only partial dependency

      • The class of only some objects in D can be completely determined by attributes in C.

    • IF K(C,D) = 0, we have no dependency

      • No object in D can be completely determined by the attributes in C.


    Using type inference and induced rules to provide intensional answers

    Similarly, we can calculate the dependency of {kind of product} on other attribute groupings:

    • Dependency on {shape, size} = 1

    • Dependency on {size, taste} = 1

    • Dependency on {shape, size, taste} = 1


    Minimal set of attributes or reducts

    Minimal Set of Attributes or REDUCTS

    • Objective

      • Find the minimal set (or sets) of interacting attributes that would have the same discriminating power as the original set of attributes.

      • This would allow us to eliminate irrelevant or noisy attributes without loss of essential information.

        In our example, {size, shape} and {size, taste} are minimal sets of attributes.

        Advantages:

    • Irrelevant attributes can be eliminated from a diagnostic procedure, thereby reducing the costs of testing and obtaining those values.

    • The knowledge-base system can form decision rules based on minimal sets.

      For example, we can form the rules

      if (size = large) and (taste = sweet) then kind of product = melon

      if (shape = cyl) and (size = small) then kind of product = other


    Deterministic and non deterministic rules

    Deterministic and Non-Deterministic Rules

    • Deterministic rules have only one outcome. They are obtained from the positive and negative regions of the approximation space.

    • Non-deterministic rules can have more than one outcome. They are formed from the boundary regions of the approximation space.

      Selection of the best minimal set

      If there are more than one minimal set, which is the best one?

    • If we assign a cost function to the attributes, selection can be based on minimum cost criterion

      • E.g., In medical domain, some diagnostic procedures are more expensive than others.

    • If there is no cost function, select the set with the minimal number of attributes


    Example 4

    Example 4


    Using type inference and induced rules to provide intensional answers

    • If Condition Attributes C = {size, cyl, turbo, fuelsys, displace, comp, power, trans, weight} and Decision Attribute D = {mileage}

      K(C,D) was calculated to be 1

    • If C = {size, power}, D = {mileage}

      K(C,D) was calculated to be 0.269

      Thus, “size” and “power” are definitely not good enough to determine mileage.

    • The following were determined to be minimal sets of attributes

      {cyl, fuelsys, comp, power, weight}

      {size, fuelsys, comp, power, weight}

      {size, fuelsys, displace, weight}

      {size, cyl, fuelsys, power, weight}

      {cyl, turbo, fuelsys, displace, comp, trans, weight}

      {size, cyl, fuelsys, comp, weight}

      {size, cyl, turbo, fuelsys, trans, weight}


    View of table after attribute reduction

    View of Table After Attribute Reduction

    • Best minimal set:

    • {size, fuelsys, displace, weight}


    Set of rules produced from the reduced table

    Set of Rules Produced from the Reduced Table

    • Blanks represent “don’t cares”

    • CNo is the number of cases in the original table that support the given rule

      • It provides a measure of the strength of confidence in the rule. Higher Cno, the more the rule is confirmed.

    • Dno is the number of cases in the table with the same decision value

    • Interpreting Row 5,

    • if (fuelsys = EFI) and (displace = small)

    • then mileage = high


    Applications

    Applications

    • Speech recognition

      • The method of reduct computation was used to eliminate unnecessary spectral frequencies to find he best representation for a group of spoken words

    • Medical domain

      • Analysis of records of patients who suffered from duodenal ulcer (Pawlak, Slowinski & Slowinski, 1986)

      • Analysis of clinical data of patients with Cardiac valve diseases (Abdalla S. A. Mohammed, 1991)

    • Architecture

      • Structural design optimization by obtaining characteristic design rules from a database of existing designs and verified performance data (Arciszewski et al, 1987; Arciszewski, Ziarko, 1986)


    Summary

    Summary

    • The theory of rough sets is very good for handling qualitative, imprecise data.

      In this respect it is an improvement over probabilistic and statistical methods.

    • Since the data are not covered to numbers but handled in qualitative form, set theory is used to identify structural relationships.

    • The strength of the dependency of any set of condition attributes on a decision attribute can be determined numerically.

    • By forming minimal sets of attributes we can filter noisy or irrelevant attributes.

    • Minimal sets also identify strong data patterns that help the KB system form rules


    References

    References

    • “Rough Sets”, Zdislaw Pawlak, Kluwer Academic Publishers, 1991.

    • “Rough Sets as the Basis of a Learning System” – Chapter 2, pp. 5-13.

      “An Application of the Rough Sets Model to Analysis of Data Tables” – Chapter 3, pp. 15-29.

    • “The Discovery, Analysis, and Representation of Data Dependencies in Databases”, Wojciech Ziarko, Knowledge Discovery in Databases by Shapiro, Frawley, pp. 195-209.

    • Applications of Rough Set Theory for Clinical Data Analysis: A Case Study”, Abdalla S.A. Mohammed, Journal of Mathematical and Computer Modeling, Vol. 15, No. 10, pp. 19-37, 1991.

    • “Intelligent Information Retrieval Using Rough Set Approximations”, Padmini Srinivasan, Information Processing and Management, Vol. 25, No. 4, pp. 347-361, 1989.

    • “Uncertainty Management”, Avelino J. Gonzalez, Douglas D. Dankel, The Engineering of Knowledge-Base Systems, Chapter 8, pp. 232-262.


    Data mining concepts and techniques slides for textbook chapter 4

    Data Mining: Concepts and Techniques— Slides for Textbook — — Chapter 4 —

    ©Jiawei Han and Micheline Kamber

    Intelligent Database Systems Research Lab

    School of Computing Science

    Simon Fraser University, Canada

    http://www.cs.sfu.ca


    A data mining query language dmql

    A Data Mining Query Language (DMQL)

    • Motivation

      • A DMQL can provide the ability to support ad-hoc and interactive data mining

      • By providing a standardized language like SQL

        • Hope to achieve a similar effect like that SQL has on relational database

        • Foundation for system development and evolution

        • Facilitate information exchange, technology transfer, commercialization and wide acceptance

    • Design

      • DMQL is designed with the primitives described earlier


    Syntax for dmql

    Syntax for DMQL

    • Syntax for specification of

      • task-relevant data

      • the kind of knowledge to be mined

      • concept hierarchy specification

      • interestingness measure

      • pattern presentation and visualization

    • Putting it all together — a DMQL query


    Syntax for task relevant data specification

    Syntax for task-relevant data specification

    • use databasedatabase_name, or use data warehouse data_warehouse_name

    • from relation(s)/cube(s) [where condition]

    • in relevance to att_or_dim_list

    • order by order_list

    • group by grouping_list

    • having condition


    Specification of task relevant data

    Specification of task-relevant data


    Syntax for specifying the kind of knowledge to be mined

    Syntax for specifying the kind of knowledge to be mined

    • Characterization

      Mine_Knowledge_Specification  ::= mine characteristics [as pattern_name] analyze measure(s)

    • Discrimination

      Mine_Knowledge_Specification  ::= mine comparison [as pattern_name] for target_class where target_condition  {versus contrast_class_iwhere contrast_condition_i}  analyze measure(s)

    • Association

      Mine_Knowledge_Specification  ::= mine associations [as pattern_name]


    Syntax for specifying the kind of knowledge to be mined cont

    Syntax for specifying the kind of knowledge to be mined (cont.)

    • Classification

      Mine_Knowledge_Specification  ::= mine classification [as pattern_name] analyze classifying_attribute_or_dimension

    • Prediction

      Mine_Knowledge_Specification  ::= mine prediction [as pattern_name] analyze prediction_attribute_or_dimension {set {attribute_or_dimension_i= value_i}}


    Syntax for concept hierarchy specification

    Syntax for concept hierarchy specification

    • To specify what concept hierarchies to use

      use hierarchy <hierarchy> for <attribute_or_dimension>

    • We use different syntax to define different type of hierarchies

      • schema hierarchies

        define hierarchy time_hierarchy on date as [date,month quarter,year]

      • set-grouping hierarchies

        define hierarchy age_hierarchy for age on customer as

        level1: {young, middle_aged, senior} < level0: all

        level2: {20, ..., 39} < level1: young

        level2: {40, ..., 59} < level1: middle_aged

        level2: {60, ..., 89} < level1: senior


    Syntax for concept hierarchy specification cont

    Syntax for concept hierarchy specification (Cont.)

    • operation-derived hierarchies

      definehierarchy age_hierarchy for age on customer as

      {age_category(1), ..., age_category(5)} := cluster(default, age, 5) < all(age)

    • rule-based hierarchies

      define hierarchy profit_margin_hierarchy on item as

      level_1: low_profit_margin < level_0: all

      if (price - cost)< $50

      level_1: medium-profit_margin < level_0: all

      if ((price - cost) > $50) and ((price - cost) <= $250))

      level_1: high_profit_margin < level_0: all

      if (price - cost) > $250


    Syntax for interestingness measure specification

    Syntax for interestingness measure specification

    • Interestingness measures and thresholds can be specified by the user with the statement:

      with <interest_measure_name>  threshold = threshold_value

    • Example:

      with support threshold = 0.05

      with confidence threshold = 0.7 


    Syntax for pattern presentation and visualization specification

    Syntax for pattern presentation and visualization specification

    • We have syntax which allows users to specify the display of discovered patterns in one or more forms

      display as <result_form>

    • To facilitate interactive viewing at different concept level, the following syntax is defined:

      Multilevel_Manipulation  ::=   roll up on attribute_or_dimension | drill down on attribute_or_dimension | add attribute_or_dimension | drop attribute_or_dimension


    Putting it all together the full specification of a dmql query

    Putting it all together: the full specification of a DMQL query

    usedatabase AllElectronics_db

    usehierarchy location_hierarchy for B.address

    mine characteristics as customerPurchasing

    analyze count%

    in relevance to C.age, I.type, I.place_made

    from customer C, item I, purchases P, items_sold S, works_at W, branch

    where I.item_ID = S.item_ID and S.trans_ID = P.trans_ID

    and P.cust_ID = C.cust_ID and P.method_paid = ``AmEx''

    and P.empl_ID = W.empl_ID and W.branch_ID = B.branch_ID and B.address = ``Canada" and I.price >= 100

    with noise threshold = 0.05

    displayas table


    Other data mining languages standardization efforts

    Other Data Mining Languages & Standardization Efforts

    • Association rule language specifications

      • MSQL (Imielinski & Virmani’99)

      • MineRule (Meo Psaila and Ceri’96)

      • Query flocks based on Datalog syntax (Tsur et al’98)

    • OLEDB for DM (Microsoft’2000)

      • Based on OLE, OLE DB, OLE DB for OLAP

      • Integrating DBMS, data warehouse and data mining

    • CRISP-DM (CRoss-Industry Standard Process for Data Mining)

      • Providing a platform and process structure for effective data mining

      • Emphasizing on deploying data mining technology to solve business problems


    Data mining concepts and techniques slides for textbook chapter 5

    Data Mining: Concepts and Techniques— Slides for Textbook — — Chapter 5 —

    ©Jiawei Han and Micheline Kamber

    Intelligent Database Systems Research Lab

    School of Computing Science

    Simon Fraser University, Canada

    http://www.cs.sfu.ca


    Chapter 5 concept description characterization and comparison

    Chapter 5: Concept Description: Characterization and Comparison

    • What is concept description?

    • Data generalization and summarization-based characterization

    • Analytical characterization: Analysis of attribute relevance

    • Mining class comparisons: Discriminating between different classes

    • Mining descriptive statistical measures in large databases

    • Discussion

    • Summary


    Data generalization and summarization based characterization

    1

    2

    3

    4

    Conceptual levels

    5

    Data Generalization and Summarization-based Characterization

    • Data generalization

      • A process which abstracts a large set of task-relevant data in a database from a low conceptual levels to higher ones.

      • Approaches:

        • Data cube approach(OLAP approach)

        • Attribute-oriented induction approach


    Characterization data cube approach without using ao induction

    Characterization: Data Cube Approach (without using AO-Induction)

    • Perform computations and store results in data cubes

    • Strength

      • An efficient implementation of data generalization

      • Computation of various kinds of measures

        • e.g., count( ), sum( ), average( ), max( )

      • Generalization and specialization can be performed on a data cube by roll-up and drill-down

    • Limitations

      • handle only dimensions of simple nonnumeric data and measures of simple aggregated numeric values.

      • Lack of intelligent analysis, can’t tell which dimensions should be used and what levels should the generalization reach


    Attribute oriented induction

    Attribute-Oriented Induction

    • Proposed in 1989 (KDD ‘89 workshop)

    • Not confined to categorical data nor particular measures.

    • How it is done?

      • Collect the task-relevant data( initial relation) using a relational database query

      • Perform generalization by attribute removal or attribute generalization.

      • Apply aggregation by merging identical, generalized tuples and accumulating their respective counts.

      • Interactive presentation with users.


    Basic principles of attribute oriented induction

    Basic Principles of Attribute-Oriented Induction

    • Data focusing: task-relevant data, including dimensions, and the result is the initial relation.

    • Attribute-removal: remove attribute A if there is a large set of distinct values for A but (1) there is no generalization operator on A, or (2) A’s higher level concepts are expressed in terms of other attributes.

    • Attribute-generalization: If there is a large set of distinct values for A, and there exists a set of generalization operators on A, then select an operator and generalize A.

    • Attribute-threshold control: typical 2-8, specified/default.

    • Generalized relation threshold control: control the final relation/rule size. see example


    Basic algorithm for attribute oriented induction

    Basic Algorithm for Attribute-Oriented Induction

    • InitialRel: Query processing of task-relevant data, deriving the initial relation.

    • PreGen: Based on the analysis of the number of distinct values in each attribute, determine generalization plan for each attribute: removal? or how high to generalize?

    • PrimeGen: Based on the PreGen plan, perform generalization to the right level to derive a “prime generalized relation”, accumulating the counts.

    • Presentation: User interaction: (1) adjust levels by drilling, (2) pivoting, (3) mapping into rules, cross tabs, visualization presentations.

      See ImplementationSee exampleSee complexity


    Example

    Example

    • DMQL: Describe general characteristics of graduate students in the Big-University database

      useBig_University_DB

      mine characteristics as “Science_Students”

      in relevance to name, gender, major, birth_place, birth_date, residence, phone#, gpa

      fromstudent

      where status in “graduate”

    • Corresponding SQL statement:

      Select name, gender, major, birth_place, birth_date, residence, phone#, gpa

      from student

      where status in {“Msc”, “MBA”, “PhD” }


    Class characterization an example

    Class Characterization: An Example

    Initial Relation

    Prime Generalized Relation

    See Principles

    See Algorithm

    See Implementation

    See Analytical Characterization


    Presentation of generalized results

    Presentation of Generalized Results

    • Generalized relation:

      • Relations where some or all attributes are generalized, with counts or other aggregation values accumulated.

    • Cross tabulation:

      • Mapping results into cross tabulation form (similar to contingency tables).

      • Visualization techniques:

      • Pie charts, bar charts, curves, cubes, and other visual forms.

    • Quantitative characteristic rules:

      • Mapping generalized result into characteristic rules with quantitative information associated with it, e.g.,


    Presentation generalized relation

    Presentation—Generalized Relation


    Presentation crosstab

    Presentation—Crosstab


    Implementation by cube technology

    Implementation by Cube Technology

    • Construct a data cube on-the-fly for the given data mining query

      • Facilitate efficient drill-down analysis

      • May increase the response time

      • A balanced solution: precomputation of “subprime” relation

    • Use a predefined & precomputed data cube

      • Construct a data cube beforehand

      • Facilitate not only the attribute-oriented induction, but also attribute relevance analysis, dicing, slicing, roll-up and drill-down

      • Cost of cube computation and the nontrivial storage overhead


    Characterization vs olap

    Characterization vs. OLAP

    • Similarity:

      • Presentation of data summarization at multiple levels of abstraction.

      • Interactive drilling, pivoting, slicing and dicing.

    • Differences:

      • Automated desired level allocation.

      • Dimension relevance analysis and ranking when there are many relevant dimensions.

      • Sophisticated typing on dimensions and measures.

      • Analytical characterization: data dispersion analysis.


    Attribute relevance analysis

    Attribute Relevance Analysis

    • Why?

      • Which dimensions should be included?

      • How high level of generalization?

      • Automatic vs. interactive

      • Reduce # attributes; easy to understand patterns

    • What?

      • statistical method for preprocessing data

        • filter out irrelevant or weakly relevant attributes

        • retain or rank the relevant attributes

      • relevance related to dimensions and levels

      • analytical characterization, analytical comparison


    Attribute relevance analysis cont d

    Attribute relevance analysis (cont’d)

    • How?

      • Data Collection

      • Analytical Generalization

        • Use information gain analysis (e.g., entropy or other measures) to identify highly relevant dimensions and levels.

      • Relevance Analysis

        • Sort and select the most relevant dimensions and levels.

      • Attribute-oriented Induction for class description

        • On selected dimension/level

      • OLAP operations (e.g. drilling, slicing) on relevance rules


    Relevance measures

    Relevance Measures

    • Quantitative relevance measure determines the classifying power of an attribute within a set of data.

    • Methods

      • information gain (ID3)

      • gain ratio (C4.5)

      • gini index

      • 2 contingency table statistics

      • uncertainty coefficient


    Entropy and information gain

    Entropy and Information Gain

    • S contains si tuples of class Ci for i = {1, …, m}

    • Information measures info required to classify any arbitrary tuple

    • Entropy of attribute A with values {a1,a2,…,av}

    • Information gained by branching on attribute A


    Example analytical characterization

    Example: Analytical Characterization

    • Task

      • Mine general characteristics describing graduate students using analytical characterization

    • Given

      • attributes name, gender, major, birth_place, birth_date, phone#, and gpa

      • Gen(ai) = concept hierarchies on ai

      • Ui = attribute analytical thresholds for ai

      • Ti = attribute generalization thresholds for ai

      • R = attribute relevance threshold


    Example analytical characterization cont d

    Example: Analytical Characterization (cont’d)

    • 1. Data collection

      • target class: graduate student

      • contrasting class: undergraduate student

    • 2. Analytical generalization using Ui

      • attribute removal

        • remove name and phone#

      • attribute generalization

        • generalize major, birth_place, birth_date and gpa

        • accumulate counts

      • candidate relation: gender, major, birth_country, age_range and gpa


    Example analytical characterization 2

    Example: Analytical characterization (2)

    Candidate relation for Target class: Graduate students (=120)

    Candidate relation for Contrasting class: Undergraduate students (=130)


    Example analytical characterization 3

    Number of grad students in “Science”

    Number of undergrad students in “Science”

    Example: Analytical characterization (3)

    • 3. Relevance analysis

      • Calculate expected info required to classify an arbitrary tuple

      • Calculate entropy of each attribute: e.g. major


    Example analytical characterization 4

    Example: Analytical Characterization (4)

    • Calculate expected info required to classify a given sample if S is partitioned according to the attribute

    • Calculate information gain for each attribute

      • Information gain for all attributes


    Example analytical characterization 5

    Example: Analytical characterization (5)

    • 4. Initial working relation (W0) derivation

      • R = 0.1

      • remove irrelevant/weakly relevant attributes from candidate relation => drop gender, birth_country

      • remove contrasting class candidate relation

    • 5. Perform attribute-oriented induction on W0 using Ti

    Initial target class working relation W0: Graduate students


    Chapter 5 concept description characterization and comparison1

    Chapter 5: Concept Description: Characterization and Comparison

    • What is concept description?

    • Data generalization and summarization-based characterization

    • Analytical characterization: Analysis of attribute relevance

    • Mining class comparisons: Discriminating between different classes

    • Mining descriptive statistical measures in large databases

    • Discussion

    • Summary


    Mining class comparisons

    Mining Class Comparisons

    • Comparison: Comparing two or more classes.

    • Method:

      • Partition the set of relevant data into the target class and the contrasting class(es)

      • Generalize both classes to the same high level concepts

      • Compare tuples with the same high level descriptions

      • Present for every tuple its description and two measures:

        • support - distribution within single class

        • comparison - distribution between classes

      • Highlight the tuples with strong discriminant features

    • Relevance Analysis:

      • Find attributes (features) which best distinguish different classes.


    Example analytical comparison

    Example: Analytical comparison

    • Task

      • Compare graduate and undergraduate students using discriminant rule.

      • DMQL query

    use Big_University_DB

    mine comparison as “grad_vs_undergrad_students”

    in relevance toname, gender, major, birth_place, birth_date, residence, phone#, gpa

    for “graduate_students”

    where status in “graduate”

    versus “undergraduate_students”

    where status in “undergraduate”

    analyze count%

    from student


    Example analytical comparison 2

    Example: Analytical comparison (2)

    • Given

      • attributes name, gender, major, birth_place, birth_date, residence, phone# and gpa

      • Gen(ai) = concept hierarchies on attributes ai

      • Ui = attribute analytical thresholds for attributes ai

      • Ti = attribute generalization thresholds for attributes ai

      • R = attribute relevance threshold


    Example analytical comparison 3

    Example: Analytical comparison (3)

    • 1. Data collection

      • target and contrasting classes

    • 2. Attribute relevance analysis

      • remove attributes name, gender, major, phone#

    • 3. Synchronous generalization

      • controlled by user-specified dimension thresholds

      • prime target and contrasting class(es) relations/cuboids


    Example analytical comparison 4

    Example: Analytical comparison (4)

    Prime generalized relation for the target class: Graduate students

    Prime generalized relation for the contrasting class: Undergraduate students


    Example analytical comparison 5

    Example: Analytical comparison (5)

    • 4. Drill down, roll up and other OLAP operations on target and contrasting classes to adjust levels of abstractions of resulting description

    • 5. Presentation

      • as generalized relations, crosstabs, bar charts, pie charts, or rules

      • contrasting measures to reflect comparison between target and contrasting classes

        • e.g. count%


    Quantitative discriminant rules

    Quantitative Discriminant Rules

    • Cj = target class

    • qa = a generalized tuple covers some tuples of class

      • but can also cover some tuples of contrasting class

    • d-weight

      • range: [0, 1]

    • quantitative discriminant rule form


    Example quantitative discriminant rule

    Example: Quantitative Discriminant Rule

    • Quantitative discriminant rule

      • where 90/(90+120) = 30%

    Count distribution between graduate and undergraduate students for a generalized tuple


    Class description

    Class Description

    • Quantitative characteristic rule

      • necessary

    • Quantitative discriminant rule

      • sufficient

    • Quantitative description rule

      • necessary and sufficient


    Example quantitative description rule

    Example: Quantitative Description Rule

    • Quantitative description rule for target class Europe

    Crosstab showing associated t-weight, d-weight values and total number (in thousands) of TVs and computers sold at AllElectronics in 1998


    Chapter 5 concept description characterization and comparison2

    Chapter 5: Concept Description: Characterization and Comparison

    • What is concept description?

    • Data generalization and summarization-based characterization

    • Analytical characterization: Analysis of attribute relevance

    • Mining class comparisons: Discriminating between different classes

    • Mining descriptive statistical measures in large databases

    • Discussion

    • Summary


    Mining data dispersion characteristics

    Mining Data Dispersion Characteristics

    • Motivation

      • To better understand the data: central tendency, variation and spread

    • Data dispersion characteristics

      • median, max, min, quantiles, outliers, variance, etc.

    • Numerical dimensions correspond to sorted intervals

      • Data dispersion: analyzed with multiple granularities of precision

      • Boxplot or quantile analysis on sorted intervals

    • Dispersion analysis on computed measures

      • Folding measures into numerical dimensions

      • Boxplot or quantile analysis on the transformed cube


    Comparison of entire vs factored version space

    Comparison of Entire vs. Factored Version Space


    Incremental and parallel mining of concept description

    Incremental and Parallel Mining of Concept Description

    • Incremental mining: revision based on newly added data DB

      • Generalize DB to the same level of abstraction in the generalized relation R to derive R

      • Union R U R, i.e., merge counts and other statistical information to produce a new relation R’

    • Similar philosophy can be applied to data sampling, parallel and/or distributed mining, etc.


    Chapter 5 concept description characterization and comparison3

    Chapter 5: Concept Description: Characterization and Comparison

    • What is concept description?

    • Data generalization and summarization-based characterization

    • Analytical characterization: Analysis of attribute relevance

    • Mining class comparisons: Discriminating between different classes

    • Mining descriptive statistical measures in large databases

    • Discussion

    • Summary


    Summary1

    Summary

    • Concept description: characterization and discrimination

    • OLAP-based vs. attribute-oriented induction

    • Efficient implementation of AOI

    • Analytical characterization and comparison

    • Mining descriptive statistical measures in large databases

    • Discussion

      • Incremental and parallel mining of description

      • Descriptive mining of complex types of data


    References1

    References

    • Y. Cai, N. Cercone, and J. Han. Attribute-oriented induction in relational databases. In G. Piatetsky-Shapiro and W. J. Frawley, editors, Knowledge Discovery in Databases, pages 213-228. AAAI/MIT Press, 1991.

    • S. Chaudhuri and U. Dayal. An overview of data warehousing and OLAP technology. ACM SIGMOD Record, 26:65-74, 1997

    • C. Carter and H. Hamilton. Efficient attribute-oriented generalization for knowledge discovery from large databases. IEEE Trans. Knowledge and Data Engineering, 10:193-208, 1998.

    • W. Cleveland. Visualizing Data. Hobart Press, Summit NJ, 1993.

    • J. L. Devore. Probability and Statistics for Engineering and the Science, 4th ed. Duxbury Press, 1995.

    • T. G. Dietterich and R. S. Michalski. A comparative review of selected methods for learning from examples. In Michalski et al., editor, Machine Learning: An Artificial Intelligence Approach, Vol. 1, pages 41-82. Morgan Kaufmann, 1983.

    • J. Gray, S. Chaudhuri, A. Bosworth, A. Layman, D. Reichart, M. Venkatrao, F. Pellow, and H. Pirahesh. Data cube: A relational aggregation operator generalizing group-by, cross-tab and sub-totals. Data Mining and Knowledge Discovery, 1:29-54, 1997.

    • J. Han, Y. Cai, and N. Cercone. Data-driven discovery of quantitative rules in relational databases. IEEE Trans. Knowledge and Data Engineering, 5:29-40, 1993.


    References cont

    References (cont.)

    • J. Han and Y. Fu. Exploration of the power of attribute-oriented induction in data mining. In U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, pages 399-421. AAAI/MIT Press, 1996.

    • R. A. Johnson and D. A. Wichern. Applied Multivariate Statistical Analysis, 3rd ed. Prentice Hall, 1992.

    • E. Knorr and R. Ng. Algorithms for mining distance-based outliers in large datasets. VLDB'98, New York, NY, Aug. 1998.

    • H. Liu and H. Motoda. Feature Selection for Knowledge Discovery and Data Mining. Kluwer Academic Publishers, 1998.

    • R. S. Michalski. A theory and methodology of inductive learning. In Michalski et al., editor, Machine Learning: An Artificial Intelligence Approach, Vol. 1, Morgan Kaufmann, 1983.

    • T. M. Mitchell. Version spaces: A candidate elimination approach to rule learning. IJCAI'97, Cambridge, MA.

    • T. M. Mitchell. Generalization as search. Artificial Intelligence, 18:203-226, 1982.

    • T. M. Mitchell. Machine Learning. McGraw Hill, 1997.

    • J. R. Quinlan. Induction of decision trees. Machine Learning, 1:81-106, 1986.

    • D. Subramanian and J. Feigenbaum. Factorization in experiment generation. AAAI'86, Philadelphia, PA, Aug. 1986.


    Http www cs sfu ca han dmbook

    http://www.cs.sfu.ca/~han/dmbook

    Thank you !!!


    Data mining concepts and techniques slides for textbook chapter 6

    Data Mining: Concepts and Techniques— Slides for Textbook — — Chapter 6 —

    ©Jiawei Han and Micheline Kamber

    Intelligent Database Systems Research Lab

    School of Computing Science

    Simon Fraser University, Canada

    http://www.cs.sfu.ca


    Chapter 6 mining association rules in large databases

    Chapter 6: Mining Association Rules in Large Databases

    • Association rule mining

    • Mining single-dimensional Boolean association rules from transactional databases

    • Mining multilevel association rules from transactional databases

    • Mining multidimensional association rules from transactional databases and data warehouse

    • From association mining to correlation analysis

    • Constraint-based association mining

    • Summary


    What is association mining

    What Is Association Mining?

    • Association rule mining:

      • Finding frequent patterns, associations, correlations, or causal structures among sets of items or objects in transaction databases, relational databases, and other information repositories.

    • Applications:

      • Basket data analysis, cross-marketing, catalog design, loss-leader analysis, clustering, classification, etc.

    • Examples.

      • Rule form: “Body ® Head [support, confidence]”.

      • buys(x, “diapers”) ® buys(x, “beers”) [0.5%, 60%]

      • major(x, “CS”) ^ takes(x, “DB”) ® grade(x, “A”) [1%, 75%]


    Association rule basic concepts

    Association Rule: Basic Concepts

    • Given: (1) database of transactions, (2) each transaction is a list of items (purchased by a customer in a visit)

    • Find: all rules that correlate the presence of one set of items with that of another set of items

      • E.g., 98% of people who purchase tires and auto accessories also get automotive services done

    • Applications

      • *  Maintenance Agreement (What the store should do to boost Maintenance Agreement sales)

      • Home Electronics  * (What other products should the store stocks up?)

      • Attached mailing in direct marketing

      • Detecting “ping-pong”ing of patients, faulty “collisions”


    Rule measures support and confidence

    Customer

    buys both

    Customer

    buys diaper

    Customer

    buys beer

    Rule Measures: Support and Confidence

    • Find all the rules X & Y  Z with minimum confidence and support

      • support,s, probability that a transaction contains {X  Y  Z}

      • confidence,c,conditional probability that a transaction having {X  Y} also contains Z

    Let minimum support 50%, and minimum confidence 50%, we have

    • A  C (50%, 66.6%)

    • C  A (50%, 100%)


    Association rule mining a road map

    Association Rule Mining: A Road Map

    • Boolean vs. quantitative associations (Based on the types of values handled)

      • buys(x, “SQLServer”) ^ buys(x, “DMBook”) ® buys(x, “DBMiner”) [0.2%, 60%]

      • age(x, “30..39”) ^ income(x, “42..48K”) ® buys(x, “PC”) [1%, 75%]

    • Single dimension vs. multiple dimensional associations (see ex. Above)

    • Single level vs. multiple-level analysis

      • What brands of beers are associated with what brands of diapers?

    • Various extensions

      • Correlation, causality analysis

        • Association does not necessarily imply correlation or causality

      • Maxpatterns and closed itemsets

      • Constraints enforced

        • E.g., small sales (sum < 100) trigger big buys (sum > 1,000)?


    Chapter 6 mining association rules in large databases1

    Chapter 6: Mining Association Rules in Large Databases

    • Association rule mining

    • Mining single-dimensional Boolean association rules from transactional databases

    • Mining multilevel association rules from transactional databases

    • Mining multidimensional association rules from transactional databases and data warehouse

    • From association mining to correlation analysis

    • Constraint-based association mining

    • Summary


    Mining association rules an example

    Mining Association Rules—An Example

    For rule AC:

    support = support({AC}) = 50%

    confidence = support({AC})/support({A}) = 66.6%

    The Apriori principle:

    Any subset of a frequent itemset must be frequent

    Min. support 50%

    Min. confidence 50%


    Mining frequent itemsets the key step

    Mining Frequent Itemsets: the Key Step

    • Find the frequent itemsets: the sets of items that have minimum support

      • A subset of a frequent itemset must also be a frequent itemset

        • i.e., if {AB} isa frequent itemset, both {A} and {B} should be a frequent itemset

      • Iteratively find frequent itemsets with cardinality from 1 to k (k-itemset)

    • Use the frequent itemsets to generate association rules.


    The apriori algorithm

    The Apriori Algorithm

    • Join Step: Ckis generated by joining Lk-1with itself

    • Prune Step: Any (k-1)-itemset that is not frequent cannot be a subset of a frequent k-itemset

    • Pseudo-code:

      Ck: Candidate itemset of size k

      Lk : frequent itemset of size k

      L1 = {frequent items};

      for(k = 1; Lk !=; k++) do begin

      Ck+1 = candidates generated from Lk;

      for each transaction t in database do

      increment the count of all candidates in Ck+1 that are contained in t

      Lk+1 = candidates in Ck+1 with min_support

      end

      returnkLk;


    The apriori algorithm example

    Database D

    L1

    C1

    Scan D

    C2

    C2

    L2

    Scan D

    L3

    C3

    Scan D

    The Apriori Algorithm — Example


    How to generate candidates

    How to Generate Candidates?

    • Suppose the items in Lk-1 are listed in an order

    • Step 1: self-joining Lk-1

      insert intoCk

      select p.item1, p.item2, …, p.itemk-1, q.itemk-1

      from Lk-1 p, Lk-1 q

      where p.item1=q.item1, …, p.itemk-2=q.itemk-2, p.itemk-1 < q.itemk-1

    • Step 2: pruning

      forall itemsets c in Ckdo

      forall (k-1)-subsets s of c do

      if (s is not in Lk-1) then delete c from Ck


    How to count supports of candidates

    How to Count Supports of Candidates?

    • Why counting supports of candidates a problem?

      • The total number of candidates can be very huge

      • One transaction may contain many candidates

    • Method:

      • Candidate itemsets are stored in a hash-tree

      • Leaf node of hash-tree contains a list of itemsets and counts

      • Interior node contains a hash table

      • Subset function: finds all the candidates contained in a transaction


    Example of generating candidates

    Example of Generating Candidates

    • L3={abc, abd, acd, ace, bcd}

    • Self-joining: L3*L3

      • abcd from abc and abd

      • acde from acd and ace

    • Pruning:

      • acde is removed because ade is not in L3

    • C4={abcd}


    Methods to improve apriori s efficiency

    Methods to Improve Apriori’s Efficiency

    • Hash-based itemset counting: A k-itemset whose corresponding hashing bucket count is below the threshold cannot be frequent

    • Transaction reduction: A transaction that does not contain any frequent k-itemset is useless in subsequent scans

    • Partitioning: Any itemset that is potentially frequent in DB must be frequent in at least one of the partitions of DB

    • Sampling: mining on a subset of given data, lower support threshold + a method to determine the completeness

    • Dynamic itemset counting: add new candidate itemsets only when all of their subsets are estimated to be frequent


    Is apriori fast enough performance bottlenecks

    Is Apriori Fast Enough? — Performance Bottlenecks

    • The core of the Apriori algorithm:

      • Use frequent (k – 1)-itemsets to generate candidate frequent k-itemsets

      • Use database scan and pattern matching to collect counts for the candidate itemsets

    • The bottleneck of Apriori: candidate generation

      • Huge candidate sets:

        • 104 frequent 1-itemset will generate 107 candidate 2-itemsets

        • To discover a frequent pattern of size 100, e.g., {a1, a2, …, a100}, one needs to generate 2100  1030 candidates.

      • Multiple scans of database:

        • Needs (n +1 ) scans, n is the length of the longest pattern


    Mining frequent patterns without candidate generation

    Mining Frequent Patterns Without Candidate Generation

    • Compress a large database into a compact, Frequent-Pattern tree (FP-tree) structure

      • highly condensed, but complete for frequent pattern mining

      • avoid costly database scans

    • Develop an efficient, FP-tree-based frequent pattern mining method

      • A divide-and-conquer methodology: decompose mining tasks into smaller ones

      • Avoid candidate generation: sub-database test only!


    Construct fp tree from a transaction db

    TIDItems bought (ordered) frequent items

    100{f, a, c, d, g, i, m, p}{f, c, a, m, p}

    200{a, b, c, f, l, m, o}{f, c, a, b, m}

    300{b, f, h, j, o}{f, b}

    400{b, c, k, s, p}{c, b, p}

    500{a, f, c, e, l, p, m, n}{f, c, a, m, p}

    min_support = 0.5

    {}

    Header Table

    Item frequency head

    f4

    c4

    a3

    b3

    m3

    p3

    f:4

    c:1

    • Steps:

    • Scan DB once, find frequent 1-itemset (single item pattern)

    • Order frequent items in frequency descending order

    • Scan DB again, construct FP-tree

    c:3

    b:1

    b:1

    a:3

    p:1

    m:2

    b:1

    p:2

    m:1

    Construct FP-tree from a Transaction DB


    Benefits of the fp tree structure

    Benefits of the FP-tree Structure

    • Completeness:

      • never breaks a long pattern of any transaction

      • preserves complete information for frequent pattern mining

    • Compactness

      • reduce irrelevant information—infrequent items are gone

      • frequency descending ordering: more frequent items are more likely to be shared

      • never be larger than the original database (if not count node-links and counts)

      • Example: For Connect-4 DB, compression ratio could be over 100


    Mining frequent patterns using fp tree

    Mining Frequent Patterns Using FP-tree

    • General idea (divide-and-conquer)

      • Recursively grow frequent pattern path using the FP-tree

    • Method

      • For each item, construct its conditional pattern-base, and then its conditional FP-tree

      • Repeat the process on each newly created conditional FP-tree

      • Until the resulting FP-tree is empty, or it contains only one path(single path will generate all the combinations of its sub-paths, each of which is a frequent pattern)


    Major steps to mine fp tree

    Major Steps to Mine FP-tree

    • Construct conditional pattern base for each node in the FP-tree

    • Construct conditional FP-tree from each conditional pattern-base

    • Recursively mine conditional FP-trees and grow frequent patterns obtained so far

      • If the conditional FP-tree contains a single path, simply enumerate all the patterns


    Step 1 from fp tree to conditional pattern base

    Header Table

    Item frequency head

    f4

    c4

    a3

    b3

    m3

    p3

    {}

    f:4

    c:1

    c:3

    b:1

    b:1

    a:3

    p:1

    m:2

    b:1

    p:2

    m:1

    Step 1: From FP-tree to Conditional Pattern Base

    • Starting at the frequent header table in the FP-tree

    • Traverse the FP-tree by following the link of each frequent item

    • Accumulate all of transformed prefix paths of that item to form a conditional pattern base

    Conditional pattern bases

    itemcond. pattern base

    cf:3

    afc:3

    bfca:1, f:1, c:1

    mfca:2, fcab:1

    pfcam:2, cb:1


    Properties of fp tree for conditional pattern base construction

    Properties of FP-tree for Conditional Pattern Base Construction

    • Node-link property

      • For any frequent item ai,all the possible frequent patterns that contain ai can be obtained by following ai's node-links, starting from ai's head in the FP-tree header

    • Prefix path property

      • To calculate the frequent patterns for a node ai in a path P, only the prefix sub-path of ai in P need to be accumulated, and its frequency count should carry the same count as node ai.


    Step 2 construct conditional fp tree

    • m-conditional pattern base:

      • fca:2, fcab:1

    {}

    Header Table

    Item frequency head

    f4

    c4

    a3

    b3

    m3

    p3

    {}

    f:4

    c:1

    All frequent patterns concerning m

    m,

    fm, cm, am,

    fcm, fam, cam,

    fcam

    c:3

    b:1

    b:1

    f:3

    a:3

    p:1

    c:3

    m:2

    b:1

    a:3

    p:2

    m:1

    m-conditional FP-tree

    Step 2: Construct Conditional FP-tree

    • For each pattern-base

      • Accumulate the count for each item in the base

      • Construct the FP-tree for the frequent items of the pattern base


    Mining frequent patterns by creating conditional pattern bases

    Item

    Conditional pattern-base

    Conditional FP-tree

    p

    {(fcam:2), (cb:1)}

    {(c:3)}|p

    m

    {(fca:2), (fcab:1)}

    {(f:3, c:3, a:3)}|m

    b

    {(fca:1), (f:1), (c:1)}

    Empty

    a

    {(fc:3)}

    {(f:3, c:3)}|a

    c

    {(f:3)}

    {(f:3)}|c

    f

    Empty

    Empty

    Mining Frequent Patterns by Creating Conditional Pattern-Bases


    Step 3 recursively mine the conditional fp tree

    {}

    f:3

    c:3

    am-conditional FP-tree

    {}

    f:3

    c:3

    a:3

    m-conditional FP-tree

    Step 3: Recursively mine the conditional FP-tree

    Cond. pattern base of “am”: (fc:3)

    {}

    Cond. pattern base of “cm”: (f:3)

    f:3

    cm-conditional FP-tree

    {}

    Cond. pattern base of “cam”: (f:3)

    f:3

    cam-conditional FP-tree


    Single fp tree path generation

    {}

    All frequent patterns concerning m

    m,

    fm, cm, am,

    fcm, fam, cam,

    fcam

    f:3

    c:3

    a:3

    m-conditional FP-tree

    Single FP-tree Path Generation

    • Suppose an FP-tree T has a single path P

    • The complete set of frequent pattern of T can be generated by enumeration of all the combinations of the sub-paths of P


    Principles of frequent pattern growth

    Principles of Frequent Pattern Growth

    • Pattern growth property

      • Let  be a frequent itemset in DB, B be 's conditional pattern base, and  be an itemset in B. Then    is a frequent itemset in DB iff  is frequent in B.

    • “abcdef ” is a frequent pattern, if and only if

      • “abcde ” is a frequent pattern, and

      • “f ” is frequent in the set of transactions containing “abcde ”


    Why is frequent pattern growth fast

    Why Is Frequent Pattern Growth Fast?

    • Our performance study shows

      • FP-growth is an order of magnitude faster than Apriori, and is also faster than tree-projection

    • Reasoning

      • No candidate generation, no candidate test

      • Use compact data structure

      • Eliminate repeated database scan

      • Basic operation is counting and FP-tree building


    Fp growth vs apriori scalability with the support threshold

    FP-growth vs. Apriori: Scalability With the Support Threshold

    Data set T25I20D10K


    Fp growth vs tree projection scalability with support threshold

    FP-growth vs. Tree-Projection: Scalability with Support Threshold

    Data set T25I20D100K


    Presentation of association rules table form

    Presentation of Association Rules (Table Form )


    Using type inference and induced rules to provide intensional answers

    Visualization of Association Rule Using Plane Graph


    Using type inference and induced rules to provide intensional answers

    Visualization of Association Rule Using Rule Graph


    Iceberg queries

    Iceberg Queries

    • Icerberg query: Compute aggregates over one or a set of attributes only for those whose aggregate values is above certain threshold

    • Example:

      selectP.custID, P.itemID, sum(P.qty)

      from purchase P

      group by P.custID, P.itemID

      havingsum(P.qty) >= 10

    • Compute iceberg queries efficiently by Apriori:

      • First compute lower dimensions

      • Then compute higher dimensions only when all the lower ones are above the threshold


    Chapter 6 mining association rules in large databases2

    Chapter 6: Mining Association Rules in Large Databases

    • Association rule mining

    • Mining single-dimensional Boolean association rules from transactional databases

    • Mining multilevel association rules from transactional databases

    • Mining multidimensional association rules from transactional databases and data warehouse

    • From association mining to correlation analysis

    • Constraint-based association mining

    • Summary


    Multiple level association rules

    Food

    bread

    milk

    2%

    white

    wheat

    skim

    Fraser

    Sunset

    Multiple-Level Association Rules

    • Items often form hierarchy.

    • Items at the lower level are expected to have lower support.

    • Rules regarding itemsets at

      appropriate levels could be quite useful.

    • Transaction database can be encoded based on dimensions and levels

    • We can explore shared multi-level mining


    Mining multi level associations

    Mining Multi-Level Associations

    • A top_down, progressive deepening approach:

      • First find high-level strong rules:

        milk ® bread [20%, 60%].

      • Then find their lower-level “weaker” rules:

        2% milk ® wheat bread [6%, 50%].

    • Variations at mining multiple-level association rules.

      • Level-crossed association rules:

        2% milk ®Wonderwheat bread

      • Association rules with multiple, alternative hierarchies:

        2% milk ®Wonder bread


    Multi level association uniform support vs reduced support

    Multi-level Association: Uniform Support vs. Reduced Support

    • Uniform Support: the same minimum support for all levels

      • + One minimum support threshold. No need to examine itemsets containing any item whose ancestors do not have minimum support.

      • – Lower level items do not occur as frequently. If support threshold

        • too high  miss low level associations

        • too low  generate too many high level associations

    • Reduced Support: reduced minimum support at lower levels

      • There are 4 search strategies:

        • Level-by-level independent

        • Level-cross filtering by k-itemset

        • Level-cross filtering by single item

        • Controlled level-cross filtering by single item


    Uniform support

    Uniform Support

    Multi-level mining with uniform support

    Level 1

    min_sup = 5%

    Milk

    [support = 10%]

    2% Milk

    [support = 6%]

    Skim Milk

    [support = 4%]

    Level 2

    min_sup = 5%


    Reduced support

    Reduced Support

    Multi-level mining with reduced support

    Level 1

    min_sup = 5%

    Milk

    [support = 10%]

    2% Milk

    [support = 6%]

    Skim Milk

    [support = 4%]

    Level 2

    min_sup = 3%


    Multi level association redundancy filtering

    Multi-level Association: Redundancy Filtering

    • Some rules may be redundant due to “ancestor” relationships between items.

    • Example

      • milk  wheat bread [support = 8%, confidence = 70%]

      • 2% milk  wheat bread [support = 2%, confidence = 72%]

    • We say the first rule is an ancestor of the second rule.

    • A rule is redundant if its support is close to the “expected” value, based on the rule’s ancestor.


    Multi level mining progressive deepening

    Multi-Level Mining: Progressive Deepening

    • A top-down, progressive deepening approach:

      • First mine high-level frequent items:

        milk (15%), bread (10%)

      • Then mine their lower-level “weaker” frequent itemsets:

        2% milk (5%), wheat bread (4%)

    • Different min_support threshold across multi-levels lead to different algorithms:

      • If adopting the same min_support across multi-levels

        then toss t if any of t’s ancestors is infrequent.

      • If adopting reduced min_support at lower levels

        then examine only those descendents whose ancestor’s support is frequent/non-negligible.


    Progressive refinement of data mining quality

    Progressive Refinement of Data Mining Quality

    • Why progressive refinement?

      • Mining operator can be expensive or cheap, fine or rough

      • Trade speed with quality: step-by-step refinement.

    • Superset coverage property:

      • Preserve all the positive answers—allow a positive false test but not a false negative test.

    • Two- or multi-step mining:

      • First apply rough/cheap operator (superset coverage)

      • Then apply expensive algorithm on a substantially reduced candidate set (Koperski & Han, SSD’95).


    Progressive refinement mining of spatial association rules

    Progressive Refinement Mining of Spatial Association Rules

    • Hierarchy of spatial relationship:

      • “g_close_to”: near_by, touch, intersect, contain, etc.

      • First search for rough relationship and then refine it.

    • Two-step mining of spatial association:

      • Step 1: rough spatial computation (as a filter)

        • Using MBR or R-tree for rough estimation.

      • Step2: Detailed spatial algorithm (as refinement)

        • Apply only to those objects which have passed the rough spatial association test (no less than min_support)


    Chapter 6 mining association rules in large databases3

    Chapter 6: Mining Association Rules in Large Databases

    • Association rule mining

    • Mining single-dimensional Boolean association rules from transactional databases

    • Mining multilevel association rules from transactional databases

    • Mining multidimensional association rules from transactional databases and data warehouse

    • From association mining to correlation analysis

    • Constraint-based association mining

    • Summary


    Multi dimensional association concepts

    Multi-Dimensional Association: Concepts

    • Single-dimensional rules:

      buys(X, “milk”)  buys(X, “bread”)

    • Multi-dimensional rules:  2 dimensions or predicates

      • Inter-dimension association rules (no repeated predicates)

        age(X,”19-25”)  occupation(X,“student”)  buys(X,“coke”)

      • hybrid-dimension association rules (repeated predicates)

        age(X,”19-25”)  buys(X, “popcorn”)  buys(X, “coke”)

    • Categorical Attributes

      • finite number of possible values, no ordering among values

    • Quantitative Attributes

      • numeric, implicit ordering among values


    Techniques for mining md associations

    Techniques for Mining MD Associations

    • Search for frequent k-predicate set:

      • Example: {age, occupation, buys} is a 3-predicate set.

      • Techniques can be categorized by how age are treated.

        1. Using static discretization of quantitative attributes

      • Quantitative attributes are statically discretized by using predefined concept hierarchies.

        2. Quantitative association rules

      • Quantitative attributes are dynamically discretized into “bins”based on the distribution of the data.

        3. Distance-based association rules

      • This is a dynamic discretization process that considers the distance between data points.


    Static discretization of quantitative attributes

    ()

    (age)

    (income)

    (buys)

    (age, income)

    (age,buys)

    (income,buys)

    (age,income,buys)

    Static Discretization of Quantitative Attributes

    • Discretized prior to mining using concept hierarchy.

    • Numeric values are replaced by ranges.

    • In relational database, finding all frequent k-predicate sets will require k or k+1 table scans.

    • Data cube is well suited for mining.

    • The cells of an n-dimensional

      cuboid correspond to the

      predicate sets.

    • Mining from data cubescan be much faster.


    Quantitative association rules

    Quantitative Association Rules

    • Numeric attributes are dynamically discretized

      • Such that the confidence or compactness of the rules mined is maximized.

    • 2-D quantitative association rules: Aquan1 Aquan2  Acat

    • Cluster “adjacent”

      association rules

      to form general

      rules using a 2-D

      grid.

    • Example:

    age(X,”30-34”)  income(X,”24K - 48K”)

     buys(X,”high resolution TV”)


    Arcs association rule clustering system

    ARCS (Association Rule Clustering System)

    How does ARCS work?

    1. Binning

    2. Find frequent predicateset

    3. Clustering

    4. Optimize


    Limitations of arcs

    Limitations of ARCS

    • Only quantitative attributes on LHS of rules.

    • Only 2 attributes on LHS. (2D limitation)

    • An alternative to ARCS

      • Non-grid-based

      • equi-depth binning

      • clustering based on a measure of partial completeness.

      • “Mining Quantitative Association Rules in Large Relational Tables” by R. Srikant and R. Agrawal.


    Mining distance based association rules

    Binning methods do not capture the semantics of interval data

    Distance-based partitioning, more meaningful discretization considering:

    density/number of points in an interval

    “closeness” of points in an interval

    Mining Distance-based Association Rules


    Chapter 6 mining association rules in large databases4

    Chapter 6: Mining Association Rules in Large Databases

    • Association rule mining

    • Mining single-dimensional Boolean association rules from transactional databases

    • Mining multilevel association rules from transactional databases

    • Mining multidimensional association rules from transactional databases and data warehouse

    • From association mining to correlation analysis

    • Constraint-based association mining

    • Summary


    Interestingness measurements

    Interestingness Measurements

    • Objective measures

      Two popular measurements:

      • support; and

      • confidence

    • Subjective measures (Silberschatz & Tuzhilin, KDD95)

      A rule (pattern) is interesting if

      • it is unexpected (surprising to the user); and/or

      • actionable (the user can do something with it)


    Criticism to support and confidence

    Criticism to Support and Confidence

    • Example 1: (Aggarwal & Yu, PODS98)

      • Among 5000 students

        • 3000 play basketball

        • 3750 eat cereal

        • 2000 both play basket ball and eat cereal

      • play basketball eat cereal [40%, 66.7%] is misleading because the overall percentage of students eating cereal is 75% which is higher than 66.7%.

      • play basketball not eat cereal [20%, 33.3%] is far more accurate, although with lower support and confidence


    Criticism to support and confidence cont

    Criticism to Support and Confidence (Cont.)

    • Example 2:

      • X and Y: positively correlated,

      • X and Z, negatively related

      • support and confidence of

        X=>Z dominates

    • We need a measure of dependent or correlated events

    • P(B|A)/P(B) is also called the lift of rule A => B


    Other interestingness measures interest

    Other Interestingness Measures: Interest

    • Interest (correlation, lift)

      • taking both P(A) and P(B) in consideration

      • P(A^B)=P(B)*P(A), if A and B are independent events

      • A and B negatively correlated, if the value is less than 1; otherwise A and B positively correlated


    Chapter 6 mining association rules in large databases5

    Chapter 6: Mining Association Rules in Large Databases

    • Association rule mining

    • Mining single-dimensional Boolean association rules from transactional databases

    • Mining multilevel association rules from transactional databases

    • Mining multidimensional association rules from transactional databases and data warehouse

    • From association mining to correlation analysis

    • Constraint-based association mining

    • Summary


    Constraint based mining

    Constraint-Based Mining

    • Interactive, exploratory mining giga-bytes of data?

      • Could it be real? — Making good use of constraints!

    • What kinds of constraints can be used in mining?

      • Knowledge type constraint: classification, association, etc.

      • Data constraint: SQL-like queries

        • Find product pairs sold together in Vancouver in Dec.’98.

      • Dimension/level constraints:

        • in relevance to region, price, brand, customer category.

      • Rule constraints

        • small sales (price < $10) triggers big sales (sum > $200).

      • Interestingness constraints:

        • strong rules (min_support  3%, min_confidence  60%).


    Rule constraints in association mining

    Rule Constraints in Association Mining

    • Two kind of rule constraints:

      • Rule form constraints: meta-rule guided mining.

        • P(x, y) ^ Q(x, w) ® takes(x, “database systems”).

      • Rule (content) constraint: constraint-based query optimization (Ng, et al., SIGMOD’98).

        • sum(LHS) < 100 ^ min(LHS) > 20 ^ count(LHS) > 3 ^ sum(RHS) > 1000

    • 1-variable vs. 2-variable constraints (Lakshmanan, et al. SIGMOD’99):

      • 1-var: A constraint confining only one side (L/R) of the rule, e.g., as shown above.

      • 2-var: A constraint confining both sides (L and R).

        • sum(LHS) < min(RHS) ^ max(RHS) < 5* sum(LHS)


    Constrain based association query

    Constrain-Based Association Query

    • Database: (1) trans (TID, Itemset ),(2)itemInfo (Item, Type, Price)

    • A constrained asso. query (CAQ) is in the form of {(S1, S2)|C },

      • where C is a set of constraints on S1, S2including frequency constraint

    • A classification of (single-variable) constraints:

      • Class constraint: S  A. e.g. S  Item

      • Domain constraint:

        • S v,   { , , , , ,  }. e.g. S.Price < 100

        • v S,  is  or . e.g. snacks  S.Type

        • V S, or S V,  { , , , ,  }

          • e.g. {snacks, sodas } S.Type

      • Aggregation constraint: agg(S)  v, where agg is in {min, max, sum, count, avg}, and   { , , , , ,  }.

        • e.g. count(S1.Type)  1 , avg(S2.Price)  100


    Constrained association query optimization problem

    Constrained Association Query Optimization Problem

    • Given a CAQ = { (S1, S2) | C }, the algorithm should be :

      • sound: It only finds frequent sets that satisfy the given constraints C

      • complete: All frequent sets satisfy the given constraints C are found

    • A naïve solution:

      • Apply Apriori for finding all frequent sets, and then to test them for constraint satisfaction one by one.

    • Our approach:

      • Comprehensive analysis of the properties of constraints and try to push them as deeply as possible inside the frequent set computation.


    Anti monotone and monotone constraints

    Anti-monotone and Monotone Constraints

    • A constraint Ca is anti-monotone iff. for any pattern S not satisfying Ca, none of the super-patterns of S can satisfy Ca

    • A constraint Cm is monotone iff. for any pattern S satisfying Cm, every super-pattern of S also satisfies it


    Succinct constraint

    Succinct Constraint

    • A subset of item Is is a succinct set, if it can be expressed as p(I) for some selection predicate p, where  is a selection operator

    • SP2I is a succinct power set, if there is a fixed number of succinct set I1, …, Ik I, s.t. SP can be expressed in terms of the strict power sets of I1, …, Ik using union and minus

    • A constraint Cs is succinct provided SATCs(I) is a succinct power set


    Convertible constraint

    Convertible Constraint

    • Suppose all items in patterns are listed in a total order R

    • A constraint C is convertible anti-monotone iff a pattern S satisfying the constraint implies that each suffix of S w.r.t. R also satisfies C

    • A constraint C is convertible monotone iff a pattern S satisfying the constraint implies that each pattern of which S is a suffix w.r.t. R also satisfies C


    Relationships among categories of constraints

    Succinctness

    Anti-monotonicity

    Monotonicity

    Convertible constraints

    Inconvertible constraints

    Relationships Among Categories of Constraints


    Property of constraints anti monotone

    Property of Constraints: Anti-Monotone

    • Anti-monotonicity: If a set S violates the constraint, any superset of S violates the constraint.

    • Examples:

      • sum(S.Price)  v is anti-monotone

      • sum(S.Price)  v is not anti-monotone

      • sum(S.Price) = v is partly anti-monotone

    • Application:

      • Push “sum(S.price)  1000” deeply into iterative frequent set computation.


    Characterization of anti monotonicity constraints

    S  v,   { , ,  }

    v  S

    S  V

    S  V

    S  V

    min(S)  v

    min(S)  v

    min(S)  v

    max(S)  v

    max(S)  v

    max(S)  v

    count(S)  v

    count(S)  v

    count(S)  v

    sum(S)  v

    sum(S)  v

    sum(S)  v

    avg(S)  v,   { , ,  }

    (frequent constraint)

    yes

    no

    no

    yes

    partly

    no

    yes

    partly

    yes

    no

    partly

    yes

    no

    partly

    yes

    no

    partly

    convertible

    (yes)

    Characterization of Anti-Monotonicity Constraints


    Example of convertible constraints avg s v

    Example of Convertible Constraints: Avg(S)  V

    • Let R be the value descending order over the set of items

      • E.g. I={9, 8, 6, 4, 3, 1}

    • Avg(S)  v is convertible monotone w.r.t. R

      • If S is a suffix of S1, avg(S1)  avg(S)

        • {8, 4, 3} is a suffix of {9, 8, 4, 3}

        • avg({9, 8, 4, 3})=6  avg({8, 4, 3})=5

      • If S satisfies avg(S) v, so does S1

        • {8, 4, 3} satisfies constraint avg(S)  4, so does {9, 8, 4, 3}


    Property of constraints succinctness

    Property of Constraints: Succinctness

    • Succinctness:

      • For any set S1 and S2 satisfying C, S1  S2 satisfies C

      • Given A1 is the sets of size 1 satisfying C, then any set S satisfying C are based on A1 , i.e., it contains a subset belongs to A1 ,

    • Example :

      • sum(S.Price )  v is not succinct

      • min(S.Price )  v is succinct

    • Optimization:

      • If C is succinct, then C is pre-counting prunable. The satisfaction of the constraint alone is not affected by the iterative support counting.


    Characterization of constraints by succinctness

    S  v,   { , ,  }

    v  S

    S V

    S  V

    S  V

    min(S)  v

    min(S)  v

    min(S)  v

    max(S)  v

    max(S)  v

    max(S)  v

    count(S)  v

    count(S)  v

    count(S)  v

    sum(S)  v

    sum(S)  v

    sum(S)  v

    avg(S)  v,   { , ,  }

    (frequent constraint)

    Yes

    yes

    yes

    yes

    yes

    yes

    yes

    yes

    yes

    yes

    yes

    weakly

    weakly

    weakly

    no

    no

    no

    no

    (no)

    Characterization of Constraints by Succinctness


    Chapter 6 mining association rules in large databases6

    Chapter 6: Mining Association Rules in Large Databases

    • Association rule mining

    • Mining single-dimensional Boolean association rules from transactional databases

    • Mining multilevel association rules from transactional databases

    • Mining multidimensional association rules from transactional databases and data warehouse

    • From association mining to correlation analysis

    • Constraint-based association mining

    • Summary


    Why is the big pie still there

    Why Is the Big Pie Still There?

    • More on constraint-based mining of associations

      • Boolean vs. quantitative associations

        • Association on discrete vs. continuous data

      • From association to correlation and causal structure analysis.

        • Association does not necessarily imply correlation or causal relationships

      • From intra-trasanction association to inter-transaction associations

        • E.g., break the barriers of transactions (Lu, et al. TOIS’99).

      • From association analysis to classification and clustering analysis

        • E.g, clustering association rules


    Chapter 6 mining association rules in large databases7

    Chapter 6: Mining Association Rules in Large Databases

    • Association rule mining

    • Mining single-dimensional Boolean association rules from transactional databases

    • Mining multilevel association rules from transactional databases

    • Mining multidimensional association rules from transactional databases and data warehouse

    • From association mining to correlation analysis

    • Constraint-based association mining

    • Summary


    Summary2

    Summary

    • Association rule mining

      • probably the most significant contribution from the database community in KDD

      • A large number of papers have been published

    • Many interesting issues have been explored

    • An interesting research direction

      • Association analysis in other types of data: spatial data, multimedia data, time series data, etc.


    References2

    References

    • R. Agarwal, C. Aggarwal, and V. V. V. Prasad. A tree projection algorithm for generation of frequent itemsets. In Journal of Parallel and Distributed Computing (Special Issue on High Performance Data Mining), 2000.

    • R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. SIGMOD'93, 207-216, Washington, D.C.

    • R. Agrawal and R. Srikant. Fast algorithms for mining association rules. VLDB'94 487-499, Santiago, Chile.

    • R. Agrawal and R. Srikant. Mining sequential patterns. ICDE'95, 3-14, Taipei, Taiwan.

    • R. J. Bayardo. Efficiently mining long patterns from databases. SIGMOD'98, 85-93, Seattle, Washington.

    • S. Brin, R. Motwani, and C. Silverstein. Beyond market basket: Generalizing association rules to correlations. SIGMOD'97, 265-276, Tucson, Arizona.

    • S. Brin, R. Motwani, J. D. Ullman, and S. Tsur. Dynamic itemset counting and implication rules for market basket analysis. SIGMOD'97, 255-264, Tucson, Arizona, May 1997.

    • K. Beyer and R. Ramakrishnan. Bottom-up computation of sparse and iceberg cubes. SIGMOD'99, 359-370, Philadelphia, PA, June 1999.

    • D.W. Cheung, J. Han, V. Ng, and C.Y. Wong. Maintenance of discovered association rules in large databases: An incremental updating technique. ICDE'96, 106-114, New Orleans, LA.

    • M. Fang, N. Shivakumar, H. Garcia-Molina, R. Motwani, and J. D. Ullman. Computing iceberg queries efficiently. VLDB'98, 299-310, New York, NY, Aug. 1998.


    References 2

    References (2)

    • G. Grahne, L. Lakshmanan, and X. Wang. Efficient mining of constrained correlated sets. ICDE'00, 512-521, San Diego, CA, Feb. 2000.

    • Y. Fu and J. Han. Meta-rule-guided mining of association rules in relational databases. KDOOD'95, 39-46, Singapore, Dec. 1995.

    • T. Fukuda, Y. Morimoto, S. Morishita, and T. Tokuyama. Data mining using two-dimensional optimized association rules: Scheme, algorithms, and visualization. SIGMOD'96, 13-23, Montreal, Canada.

    • E.-H. Han, G. Karypis, and V. Kumar. Scalable parallel data mining for association rules. SIGMOD'97, 277-288, Tucson, Arizona.

    • J. Han, G. Dong, and Y. Yin. Efficient mining of partial periodic patterns in time series database. ICDE'99, Sydney, Australia.

    • J. Han and Y. Fu. Discovery of multiple-level association rules from large databases. VLDB'95, 420-431, Zurich, Switzerland.

    • J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. SIGMOD'00, 1-12, Dallas, TX, May 2000.

    • T. Imielinski and H. Mannila. A database perspective on knowledge discovery. Communications of ACM, 39:58-64, 1996.

    • M. Kamber, J. Han, and J. Y. Chiang. Metarule-guided mining of multi-dimensional association rules using data cubes. KDD'97, 207-210, Newport Beach, California.

    • M. Klemettinen, H. Mannila, P. Ronkainen, H. Toivonen, and A.I. Verkamo. Finding interesting rules from large sets of discovered association rules. CIKM'94, 401-408, Gaithersburg, Maryland.


    References 3

    References (3)

    • F. Korn, A. Labrinidis, Y. Kotidis, and C. Faloutsos. Ratio rules: A new paradigm for fast, quantifiable data mining. VLDB'98, 582-593, New York, NY.

    • B. Lent, A. Swami, and J. Widom. Clustering association rules. ICDE'97, 220-231, Birmingham, England.

    • H. Lu, J. Han, and L. Feng. Stock movement and n-dimensional inter-transaction association rules. SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD'98), 12:1-12:7, Seattle, Washington.

    • H. Mannila, H. Toivonen, and A. I. Verkamo. Efficient algorithms for discovering association rules. KDD'94, 181-192, Seattle, WA, July 1994.

    • H. Mannila, H Toivonen, and A. I. Verkamo. Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery, 1:259-289, 1997.

    • R. Meo, G. Psaila, and S. Ceri. A new SQL-like operator for mining association rules. VLDB'96, 122-133, Bombay, India.

    • R.J. Miller and Y. Yang. Association rules over interval data. SIGMOD'97, 452-461, Tucson, Arizona.

    • R. Ng, L. V. S. Lakshmanan, J. Han, and A. Pang. Exploratory mining and pruning optimizations of constrained associations rules. SIGMOD'98, 13-24, Seattle, Washington.

    • N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Discovering frequent closed itemsets for association rules. ICDT'99, 398-416, Jerusalem, Israel, Jan. 1999.


    References 4

    References (4)

    • J.S. Park, M.S. Chen, and P.S. Yu. An effective hash-based algorithm for mining association rules. SIGMOD'95, 175-186, San Jose, CA, May 1995.

    • J. Pei, J. Han, and R. Mao. CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets. DMKD'00, Dallas, TX, 11-20, May 2000.

    • J. Pei and J. Han. Can We Push More Constraints into Frequent Pattern Mining? KDD'00. Boston, MA. Aug. 2000.

    • G. Piatetsky-Shapiro. Discovery, analysis, and presentation of strong rules. In G. Piatetsky-Shapiro and W. J. Frawley, editors, Knowledge Discovery in Databases, 229-238. AAAI/MIT Press, 1991.

    • B. Ozden, S. Ramaswamy, and A. Silberschatz. Cyclic association rules. ICDE'98, 412-421, Orlando, FL.

    • J.S. Park, M.S. Chen, and P.S. Yu. An effective hash-based algorithm for mining association rules. SIGMOD'95, 175-186, San Jose, CA.

    • S. Ramaswamy, S. Mahajan, and A. Silberschatz. On the discovery of interesting patterns in association rules. VLDB'98, 368-379, New York, NY..

    • S. Sarawagi, S. Thomas, and R. Agrawal. Integrating association rule mining with relational database systems: Alternatives and implications. SIGMOD'98, 343-354, Seattle, WA.

    • A. Savasere, E. Omiecinski, and S. Navathe. An efficient algorithm for mining association rules in large databases. VLDB'95, 432-443, Zurich, Switzerland.

    • A. Savasere, E. Omiecinski, and S. Navathe. Mining for strong negative associations in a large database of customer transactions. ICDE'98, 494-502, Orlando, FL, Feb. 1998.


    References 5

    References (5)

    • C. Silverstein, S. Brin, R. Motwani, and J. Ullman. Scalable techniques for mining causal structures. VLDB'98, 594-605, New York, NY.

    • R. Srikant and R. Agrawal. Mining generalized association rules. VLDB'95, 407-419, Zurich, Switzerland, Sept. 1995.

    • R. Srikant and R. Agrawal. Mining quantitative association rules in large relational tables. SIGMOD'96, 1-12, Montreal, Canada.

    • R. Srikant, Q. Vu, and R. Agrawal. Mining association rules with item constraints. KDD'97, 67-73, Newport Beach, California.

    • H. Toivonen. Sampling large databases for association rules. VLDB'96, 134-145, Bombay, India, Sept. 1996.

    • D. Tsur, J. D. Ullman, S. Abitboul, C. Clifton, R. Motwani, and S. Nestorov. Query flocks: A generalization of association-rule mining. SIGMOD'98, 1-12, Seattle, Washington.

    • K. Yoda, T. Fukuda, Y. Morimoto, S. Morishita, and T. Tokuyama. Computing optimized rectilinear regions for association rules. KDD'97, 96-103, Newport Beach, CA, Aug. 1997.

    • M. J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. Parallel algorithm for discovery of association rules. Data Mining and Knowledge Discovery, 1:343-374, 1997.

    • M. Zaki. Generating Non-Redundant Association Rules. KDD'00. Boston, MA. Aug. 2000.

    • O. R. Zaiane, J. Han, and H. Zhu. Mining Recurrent Items in Multimedia with Progressive Resolution Refinement. ICDE'00, 461-470, San Diego, CA, Feb. 2000.


    Http www cs sfu ca han dmbook1

    http://www.cs.sfu.ca/~han/dmbook

    Thank you !!!


    Data mining concepts and techniques slides for textbook chapter 7

    Data Mining: Concepts and Techniques— Slides for Textbook — — Chapter 7 —

    ©Jiawei Han and Micheline Kamber

    Intelligent Database Systems Research Lab

    School of Computing Science

    Simon Fraser University, Canada

    http://www.cs.sfu.ca


    Chapter 7 classification and prediction

    Chapter 7. Classification and Prediction

    • What is classification? What is prediction?

    • Issues regarding classification and prediction

    • Classification by decision tree induction

    • Bayesian Classification

    • Classification by backpropagation

    • Classification based on concepts from association rule mining

    • Other Classification Methods

    • Prediction

    • Classification accuracy

    • Summary


    Classification vs prediction

    Classification vs. Prediction

    • Classification:

      • predicts categorical class labels

      • classifies data (constructs a model) based on the training set and the values (class labels) in a classifying attribute and uses it in classifying new data

    • Prediction:

      • models continuous-valued functions, i.e., predicts unknown or missing values

    • Typical Applications

      • credit approval

      • target marketing

      • medical diagnosis

      • treatment effectiveness analysis


    Classification a two step process

    Classification—A Two-Step Process

    • Model construction: describing a set of predetermined classes

      • Each tuple/sample is assumed to belong to a predefined class, as determined by the class label attribute

      • The set of tuples used for model construction: training set

      • The model is represented as classification rules, decision trees, or mathematical formulae

    • Model usage: for classifying future or unknown objects

      • Estimate accuracy of the model

        • The known label of test sample is compared with the classified result from the model

        • Accuracy rate is the percentage of test set samples that are correctly classified by the model

        • Test set is independent of training set, otherwise over-fitting will occur


    Classification process 1 model construction

    Classification

    Algorithms

    Training

    Data

    Classifier

    (Model)

    IF rank = ‘professor’

    OR years > 6

    THEN tenured = ‘yes’

    Classification Process (1): Model Construction


    Classification process 2 use the model in prediction

    Classifier

    Testing

    Data

    Unseen Data

    (Jeff, Professor, 4)

    Tenured?

    Classification Process (2): Use the Model in Prediction


    Supervised vs unsupervised learning

    Supervised vs. Unsupervised Learning

    • Supervised learning (classification)

      • Supervision: The training data (observations, measurements, etc.) are accompanied by labels indicating the class of the observations

      • New data is classified based on the training set

    • Unsupervised learning(clustering)

      • The class labels of training data is unknown

      • Given a set of measurements, observations, etc. with the aim of establishing the existence of classes or clusters in the data


    Chapter 7 classification and prediction1

    Chapter 7. Classification and Prediction

    • What is classification? What is prediction?

    • Issues regarding classification and prediction

    • Classification by decision tree induction

    • Bayesian Classification

    • Classification by backpropagation

    • Classification based on concepts from association rule mining

    • Other Classification Methods

    • Prediction

    • Classification accuracy

    • Summary


    Issues regarding classification and prediction 1 data preparation

    Issues regarding classification and prediction (1): Data Preparation

    • Data cleaning

      • Preprocess data in order to reduce noise and handle missing values

    • Relevance analysis (feature selection)

      • Remove the irrelevant or redundant attributes

    • Data transformation

      • Generalize and/or normalize data


    Issues regarding classification and prediction 2 evaluating classification methods

    Issues regarding classification and prediction (2): Evaluating Classification Methods

    • Predictive accuracy

    • Speed and scalability

      • time to construct the model

      • time to use the model

    • Robustness

      • handling noise and missing values

    • Scalability

      • efficiency in disk-resident databases

    • Interpretability:

      • understanding and insight provded by the model

    • Goodness of rules

      • decision tree size

      • compactness of classification rules


    Chapter 7 classification and prediction2

    Chapter 7. Classification and Prediction

    • What is classification? What is prediction?

    • Issues regarding classification and prediction

    • Classification by decision tree induction

    • Bayesian Classification

    • Classification by backpropagation

    • Classification based on concepts from association rule mining

    • Other Classification Methods

    • Prediction

    • Classification accuracy

    • Summary


    Classification by decision tree induction

    Classification by Decision Tree Induction

    • Decision tree

      • A flow-chart-like tree structure

      • Internal node denotes a test on an attribute

      • Branch represents an outcome of the test

      • Leaf nodes represent class labels or class distribution

    • Decision tree generation consists of two phases

      • Tree construction

        • At start, all the training examples are at the root

        • Partition examples recursively based on selected attributes

      • Tree pruning

        • Identify and remove branches that reflect noise or outliers

    • Use of decision tree: Classifying an unknown sample

      • Test the attribute values of the sample against the decision tree


    Training dataset

    Training Dataset

    This follows an example from Quinlan’s ID3


    Output a decision tree for buys computer

    age?

    <=30

    overcast

    >40

    30..40

    student?

    credit rating?

    yes

    no

    yes

    fair

    excellent

    no

    yes

    no

    yes

    Output: A Decision Tree for “buys_computer”


    Algorithm for decision tree induction

    Algorithm for Decision Tree Induction

    • Basic algorithm (a greedy algorithm)

      • Tree is constructed in a top-down recursive divide-and-conquer manner

      • At start, all the training examples are at the root

      • Attributes are categorical (if continuous-valued, they are discretized in advance)

      • Examples are partitioned recursively based on selected attributes

      • Test attributes are selected on the basis of a heuristic or statistical measure (e.g., information gain)

    • Conditions for stopping partitioning

      • All samples for a given node belong to the same class

      • There are no remaining attributes for further partitioning – majority voting is employed for classifying the leaf

      • There are no samples left


    Attribute selection measure

    Attribute Selection Measure

    • Information gain (ID3/C4.5)

      • All attributes are assumed to be categorical

      • Can be modified for continuous-valued attributes

    • Gini index (IBM IntelligentMiner)

      • All attributes are assumed continuous-valued

      • Assume there exist several possible split values for each attribute

      • May need other tools, such as clustering, to get the possible split values

      • Can be modified for categorical attributes


    Information gain id3 c4 5

    Information Gain (ID3/C4.5)

    • Select the attribute with the highest information gain

    • Assume there are two classes, P and N

      • Let the set of examples S contain p elements of class P and n elements of class N

      • The amount of information, needed to decide if an arbitrary example in S belongs to P or N is defined as


    Information gain in decision tree induction

    Information Gain in Decision Tree Induction

    • Assume that using attribute A a set S will be partitioned into sets {S1, S2 , …, Sv}

      • If Si contains piexamples of P and ni examples of N, the entropy, or the expected information needed to classify objects in all subtrees Si is

    • The encoding information that would be gained by branching on A


    Attribute selection by information gain computation

    Class P: buys_computer = “yes”

    Class N: buys_computer = “no”

    I(p, n) = I(9, 5) =0.940

    Compute the entropy for age:

    Hence

    Similarly

    Attribute Selection by Information Gain Computation


    Gini index ibm intelligentminer

    Gini Index (IBM IntelligentMiner)

    • If a data set T contains examples from n classes, gini index, gini(T) is defined as

      where pj is the relative frequency of class j in T.

    • If a data set T is split into two subsets T1 and T2 with sizes N1 and N2 respectively, the gini index of the split data contains examples from n classes, the gini index gini(T) is defined as

    • The attribute provides the smallest ginisplit(T) is chosen to split the node (need to enumerate all possible splitting points for each attribute).


    Extracting classification rules from trees

    Extracting Classification Rules from Trees

    • Represent the knowledge in the form of IF-THEN rules

    • One rule is created for each path from the root to a leaf

    • Each attribute-value pair along a path forms a conjunction

    • The leaf node holds the class prediction

    • Rules are easier for humans to understand

    • Example

      IF age = “<=30” AND student = “no” THEN buys_computer = “no”

      IF age = “<=30” AND student = “yes” THEN buys_computer = “yes”

      IF age = “31…40” THEN buys_computer = “yes”

      IF age = “>40” AND credit_rating = “excellent” THEN buys_computer = “yes”

      IF age = “<=30” AND credit_rating = “fair” THEN buys_computer = “no”


    Avoid overfitting in classification

    Avoid Overfitting in Classification

    • The generated tree may overfit the training data

      • Too many branches, some may reflect anomalies due to noise or outliers

      • Result is in poor accuracy for unseen samples

    • Two approaches to avoid overfitting

      • Prepruning: Halt tree construction early—do not split a node if this would result in the goodness measure falling below a threshold

        • Difficult to choose an appropriate threshold

      • Postpruning: Remove branches from a “fully grown” tree—get a sequence of progressively pruned trees

        • Use a set of data different from the training data to decide which is the “best pruned tree”


    Approaches to determine the final tree size

    Approaches to Determine the Final Tree Size

    • Separate training (2/3) and testing (1/3) sets

    • Use cross validation, e.g., 10-fold cross validation

    • Use all the data for training

      • but apply a statistical test (e.g., chi-square) to estimate whether expanding or pruning a node may improve the entire distribution

    • Use minimum description length (MDL) principle:

      • halting growth of the tree when the encoding is minimized


    Enhancements to basic decision tree induction

    Enhancements to basic decision tree induction

    • Allow for continuous-valued attributes

      • Dynamically define new discrete-valued attributes that partition the continuous attribute value into a discrete set of intervals

    • Handle missing attribute values

      • Assign the most common value of the attribute

      • Assign probability to each of the possible values

    • Attribute construction

      • Create new attributes based on existing ones that are sparsely represented

      • This reduces fragmentation, repetition, and replication


    Classification in large databases

    Classification in Large Databases

    • Classification—a classical problem extensively studied by statisticians and machine learning researchers

    • Scalability: Classifying data sets with millions of examples and hundreds of attributes with reasonable speed

    • Why decision tree induction in data mining?

      • relatively faster learning speed (than other classification methods)

      • convertible to simple and easy to understand classification rules

      • can use SQL queries for accessing databases

      • comparable classification accuracy with other methods


    Scalable decision tree induction methods in data mining studies

    Scalable Decision Tree Induction Methods in Data Mining Studies

    • SLIQ (EDBT’96 — Mehta et al.)

      • builds an index for each attribute and only class list and the current attribute list reside in memory

    • SPRINT (VLDB’96 — J. Shafer et al.)

      • constructs an attribute list data structure

    • PUBLIC (VLDB’98 — Rastogi & Shim)

      • integrates tree splitting and tree pruning: stop growing the tree earlier

    • RainForest (VLDB’98 — Gehrke, Ramakrishnan & Ganti)

      • separates the scalability aspects from the criteria that determine the quality of the tree

      • builds an AVC-list (attribute, value, class label)


    Data cube based decision tree induction

    Data Cube-Based Decision-Tree Induction

    • Integration of generalization with decision-tree induction (Kamber et al’97).

    • Classification at primitive concept levels

      • E.g., precise temperature, humidity, outlook, etc.

      • Low-level concepts, scattered classes, bushy classification-trees

      • Semantic interpretation problems.

    • Cube-based multi-level classification

      • Relevance analysis at multi-levels.

      • Information-gain analysis with dimension + level.


    Presentation of classification results

    Presentation of Classification Results


    References i

    References (I)

    • C. Apte and S. Weiss. Data mining with decision trees and decision rules. Future Generation Computer Systems, 13, 1997.

    • L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees. Wadsworth International Group, 1984.

    • P. K. Chan and S. J. Stolfo. Learning arbiter and combiner trees from partitioned data for scaling machine learning. In Proc. 1st Int. Conf. Knowledge Discovery and Data Mining (KDD'95), pages 39-44, Montreal, Canada, August 1995.

    • U. M. Fayyad. Branching on attribute values in decision tree generation. In Proc. 1994 AAAI Conf., pages 601-606, AAAI Press, 1994.

    • J. Gehrke, R. Ramakrishnan, and V. Ganti. Rainforest: A framework for fast decision tree construction of large datasets. In Proc. 1998 Int. Conf. Very Large Data Bases, pages 416-427, New York, NY, August 1998.

    • M. Kamber, L. Winstone, W. Gong, S. Cheng, and J. Han. Generalization and decision tree induction: Efficient classification in data mining. In Proc. 1997 Int. Workshop Research Issues on Data Engineering (RIDE'97), pages 111-120, Birmingham, England, April 1997.


    References ii

    References (II)

    • J. Magidson. The Chaid approach to segmentation modeling: Chi-squared automatic interaction detection. In R. P. Bagozzi, editor, Advanced Methods of Marketing Research, pages 118-159. Blackwell Business, Cambridge Massechusetts, 1994.

    • M. Mehta, R. Agrawal, and J. Rissanen. SLIQ : A fast scalable classifier for data mining. In Proc. 1996 Int. Conf. Extending Database Technology (EDBT'96), Avignon, France, March 1996.

    • S. K. Murthy, Automatic Construction of Decision Trees from Data: A Multi-Diciplinary Survey, Data Mining and Knowledge Discovery 2(4): 345-389, 1998

    • J. R. Quinlan. Bagging, boosting, and c4.5. In Proc. 13th Natl. Conf. on Artificial Intelligence (AAAI'96), 725-730, Portland, OR, Aug. 1996.

    • R. Rastogi and K. Shim. Public: A decision tree classifer that integrates building and pruning. In Proc. 1998 Int. Conf. Very Large Data Bases, 404-415, New York, NY, August 1998.

    • J. Shafer, R. Agrawal, and M. Mehta. SPRINT : A scalable parallel classifier for data mining. In Proc. 1996 Int. Conf. Very Large Data Bases, 544-555, Bombay, India, Sept. 1996.

    • S. M. Weiss and C. A. Kulikowski. Computer Systems that Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning, and Expert Systems. Morgan Kaufman, 1991.


    Http www cs sfu ca han

    http://www.cs.sfu.ca/~han

    Thank you !!!


  • Login