Loading in 2 Seconds...
Loading in 2 Seconds...
Data Mining Chapter 3 Output: Knowledge Representation. Kirk Scott. A summary of ways of representing knowledge, the results of mining: Rule sets Decision trees Regression equations Clusters Deciding what kind of output you want is the first step towards picking a mining algorithm.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
For problems with numeric attributes you can apply statistical methods
An instance of Iris setosa should give a value >0 (above/to the right of the line) and an instance of Iris versicolor should give a value <0
The book summarizes the different kinds of decisions (< , =, etc.) that might be coded for a single attribute at each node in a decision tree
Figure 3.4, on the following overhead, showns (a) a linear model, (b) a regression tree, and (c) a model tree
A rule set may compactly represent a limited number of explicitly known cases
With 4 variables, a, b, c, and d, there can be up to 4 levels in the tree
The book states that “decision trees cannot easily express the disjunction implied among the different rules in a set.”
The rule set would be equally complex IF there were a rule for each branch of the tree
As soon as this and any other simplifying assumptions are relaxed, things become messier
This is because this rule is one of many association rules (all non-class attributes) (class attribute)
Accuracy = Confidence = Proportion of cases to which the rule applies where the prediction is correct
The book observes that exceptions may be “psychologically”, if no logically preferable
For each new instance, find its nearest neighbor in in the set and classify it accordingly
In effect, this forms a structural representation analogous to something seen before:
Figure 3.10, on the following overhead, illustrates some related ideas
Figure 3.10 (a): This shows the decision boundaries between two instances and the rest of the data set
Figure 3.10 (c): This shows that in practice the classification neighborhoods will be simplified to rectangular areas in space