Chapter 7 transformations
1 / 13

Chapter 7: Transformations - PowerPoint PPT Presentation

  • Uploaded on

Chapter 7: Transformations. Attribute Selection. Adding irrelevant attributes confuses learning algorithms---so avoid such attributes Both divide-and-conquer and separate-and-conquer algorithms suffer from this; Naïve Bayes does not suffer

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Chapter 7: Transformations' - lizina

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Attribute selection
Attribute Selection

  • Adding irrelevant attributes confuses learning algorithms---so avoid such attributes

  • Both divide-and-conquer and separate-and-conquer algorithms suffer from this; Naïve Bayes does not suffer

  • So first choose the attributes to be considered and then proceed---dimensionality reduction

  • Scheme independent selection:

    • Just enough attributes to divide up the instance space in a way that separates all the training instances: For example, in Table 1, if we were to drop outlook, instance 1 and 4 will be inseparable-not good. --- very tedious procedure

  • Using machine learning algorithms for attribute selection

    • Decision tree: Apply DT on all attributes, and select only those that are actually used in the decisions---the selected attributes can then be used in another chosen learning algorithm

    • Use linear SVM algorithm that ranks attributes based on weights to choose the attributes---recursive feature elimination

    • Using instance-based learning methods

      • Sample instances randomly from the training set

      • Check neighboring records of the same and different classes (near hits and near misses)

      • If a near hit has a different value for a certain attribute, that attribute appears to be irrelevant---reduce its weight

      • If a near miss, has a different value, the attribute appears to be relevant and its weight should be increased

      • After repeating this procedure many times, selection takes place---only attributes with +ve weights are chosen.

  • Searching the attribute space:

    • Fig 7.1

    • Forward selection (start with empty set and keep expanding)

    • Backward elimination (start with all, and start eliminating one by one)

    • Bidirectional search---combination of the above two

  • Scheme-specific selection

    • Cross-validation is used to measure the effectiveness of a subset of attributes

Discretizing numeric attributes
Discretizing Numeric Attributes

  • Global discretization: Used in 1R learning scheme: Sort the instances by the attribute’s value and assign the value into ranges at the points that class value changes---keeping some minimum instance coverage criteria

  • Local discretization: Used in decision trees: When a specific attribute is used to split a node, a decision is made on the value at which this break could take place

  • Transforming numeric attribute into k binary variables

  • Unsupervised discretization: Not taking the classes of the training set---break the value range into some intervals---e,g., equal-interval binning or equal-frequency binning---runs the risk of destroying distinctions within an interval or bin

  • Supervised discretization---takes classes into account while making intervals

  • Proportional k-interval discretization: #of bins chosen in a data-dependent fashion by setting it to the square root of #of instances with equal-frequency binning.

64 Y 65 N 68 Y 69 Y 70 Y 71 N 72 N 72 Y 75 Y 75 Y 80 N 81 Y 83 Y 85 N

Proportional binning

Number of bins = 4

64-68 Bin1 2Y 1N

69-71 Bin2 2Y 1N

72-75 Bin3 3Y 1N

80-85 Bin4 2Y 2N

Equal Frequency binning

Number of bins = 3

64-70 4Y 1N

71-75 3Y 2N

80-85 2Y 2N

Entropy based discretization
Entropy-based Discretization 83 Y 85 N

  • One example: Order the values of the attribute, and for each possible break-point determine the information gain (p. 298-299). Split at the point where this value is the smallest.

    • For all values, find the smallest (A);

    • Repeat this procedure for each of the parts formed by the breaking at A;

    • Repeat this step recursively until a stopping criteria is met

Some useful transformations
Some Useful Transformations 83 Y 85 N

  • Examples:

    • Subtracting one date attribute from another to obtain a new age attribute

    • Converting two attributes A and B to A/B, a new attribute representing the ratio

    • Reduce several nominal attributes to one by concatenating their vales, producing a single k1xk2 value attribute

  • Principal component analysis: Use a special coordinate system that depends on the given cloud of points as follows: place the first axis in the direction of greatest variance of the points to maximize the variance along that axis; the 2nd axis in perpendicular to it; in multi-dimensional case, choose the 2nd axis that maximizes variance along that axis; and so on; finally, choose the ones that contribute to the highest variance---the principal components


Random projections
Random Projections 83 Y 85 N

  • Since PCA is expensive (cubic in the #of dimensions), alternative is to a random projection of the data into a subspace with a predetermined number of dimensions

Text to attribute vector
Text to attribute vector 83 Y 85 N

  • Convert a document to a vector of words that occur in the document---it could be the frequency of the words or just the absence/presence of the word

  • In other words, a document is characterized by the words that appear often in it.

Time series
Time series 83 Y 85 N

  • Some times, we may replace the attributes by the difference in successive values, etc. This is time series.

Automatic data cleansing
Automatic Data Cleansing 83 Y 85 N

  • Data mining techniques themselves can sometimes help to solve the problem of cleansing the corrupted data

  • By discarding misclassified instances from the training set, relearning, and then repeating until there are no more misclassified instances, decision trees induced from data can be improved

  • Robust regression---by removing outliers, linear regression is improved

Combining multiple models
Combining Multiple Models 83 Y 85 N

  • Bagging, boosting, and stacking are prominent methods to combine multiple models

  • Bagging: Models receive equal weight---output of each model is a majority value, for example.

  • Boosting: Similar to bagging except that it assigns different weights to different model outputs

  • Option tree (Fig. 7.10) and Fig. 7.11 (-ve means play=yes; + ve means play=no;)