issues with data mining l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Issues with Data Mining PowerPoint Presentation
Download Presentation
Issues with Data Mining

Loading in 2 Seconds...

play fullscreen
1 / 15

Issues with Data Mining - PowerPoint PPT Presentation


  • 264 Views
  • Uploaded on

Issues with Data Mining. Data Mining involves Generalization. Data mining learns generalizations of the instances in training data E.g. a decision tree learnt from weather data captures generalizations about the prediction of values for Play attribute

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Issues with Data Mining' - Faraday


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
data mining involves generalization
Data Mining involves Generalization
  • Data mining learns generalizations of the instances in training data
    • E.g. a decision tree learnt from weather data captures generalizations about the prediction of values for Play attribute
    • This means, generalizations predict (or describe) the behaviour of instances beyond the training data
    • This in turn means, knowledge is extracted from raw data using data mining
  • This knowledge drives end-user’s decision making process
generalization as search
Generalization as Search
  • The process of generalization can be viewed as searching a space of all possible patterns or models
    • For a pattern that fits the data
  • This view provides a standard framework for understanding all data mining techniques
  • E.g decision tree learning involves searching through all possible decision trees
    • Lecture 4 shows two example decision trees that fit the weather data
    • One of them is a better generalization than the other (Example 2)
slide4
Bias
  • Important choices made in a data mining system are
    • representation language,
    • search method and
    • model pruning method
  • This means, each data mining scheme involves
    • Language bias – the language chosen to represent the patterns or models
    • Search bias – the order in which the space is searched
    • Overfitting-avoidance bias – the way overfitting to the training data is avoided
language bias
Language Bias
  • Different languages used for representing patterns and models
    • E.g. rules and decision trees
  • A concept fits a subset of training data
    • That subset can be described as a disjunction of rules
    • E.g classifier for the weather data can be represented as a disjunction of rules
  • Languages differ in their ability to represent patterns and models
    • This means, when a language with lower representation ability is used, the data mining system may not achieve good performance
  • Domain knowledge helps to cut down search space
search bias
Search Bias
  • An exhaustive search over the search space is computationally expensive
  • Search is speeded up by using heuristics
    • Pure children nodes indicate good tree stumps in decision tree learning
  • By definition heuristics cannot guarantee optimum patterns or models
    • Using information gain may mislead us to select a suboptimal attribute at the root
  • Complex search strategies possible
    • Those that pursue several alternatives parallelly
    • Those that allow backtracking
  • A high-level search bias
    • General-to-specific: start with a root node and grow the decision tree to fit the specific data
    • Specific-to-general: choose specific examples in each class and then generalize the class by including k-nearest neighbour examples
overfitting avoidance bias
Overfitting-avoidance bias
  • We want to search for ‘best’ patterns and models
  • Simple models are the best
  • Two strategies
    • Start with the simplest model and stop building model when it starts to become complex
    • Start with a complex model and prune it to make it simpler
  • Each strategy biases search in a different way
  • Biases are unavoidable in practice
    • Each data mining scheme might involve a configuration of biases
    • These biases may serve some problems well
  • There is no universal best learning scheme!
    • We saw this in our practicals with Weka
combining multiple models
Combining Multiple Models
  • Because there is no ideal data mining scheme, it is useful to combine multiple models
    • Idea of democracy – decisions made based on collective wisdom
    • Each data mining scheme acts like an expert using its knowledge to make decisions
  • Three general approaches
    • Bagging
    • Boosting
    • Stacking
  • Bagging and boosting both follow the same approach
    • Take a vote on the class prediction from all the different schemes
    • Bagging uses a simple average of votes while boosting uses a weighted average
    • Boosting gives more weight to more knowledgeable experts
  • Boosting is generally considered the most effective
bias variance decomposition
Bias-Variance Decomposition
  • Assume
    • Infinite training data sets of the same size, n
    • Infinite number of classifiers trained on the above data sets
  • For any learning scheme
    • Bias = expected error of the classifier even after increasing training data infinitely
    • Variance = expected error due to the particular training set used
  • Total expected error = bias + variance
  • Combining multiple classifiers decreases the expected error by reducing the variance component
bagging
Bagging
  • Bagging stands for bootstrap aggregating
    • Combines equally weighted predictions from multiple models
  • Bagging exploits instability in learning schemes
    • Instability – small change in training data results in big change in model
  • Idealized version for classifier
    • Collect several independent training sets
    • Build a classifier from each training set
      • E.g learn a decision tree from each training set
    • The class of a test instance is the prediction that received most votes from all the classifiers
  • Practically it is not feasible to obtain several independent training sets
bagging algorithm
Bagging Algorithm
  • Model Generation
    • Let n be the number of instances in the training data
    • For each of t iterations
      • Sample n instances with replacement from training data
      • Apply the learning algorithm to the sample
      • Store the resulting model
  • Classification
    • For each of the t models:
      • Predict class of instance using model
    • Return class that has been predicted most often
boosting
Boosting
  • Multiple data mining methods might complement each other
    • Each method performing well on a subset of data
  • Boosting combines complementing models
    • Using weighted voting
  • Boosting is iterative
    • Each new model is built to overcome the deficiencies in the earlier models
  • Several variants of boosting
    • AdaBoost.M1 – based on the idea of giving weights to instances
  • Boosting involves two stages
    • Model generation
    • Classification
boosting13
Boosting
  • Model generation
    • Assign equal weight to each training instance
    • For each of t iterations:
      • Apply learning algorithm to weighted dataset and store resulting model
      • Compute error e of model on weighted dataset and store error
      • If e=0 or e>=0.5
        • Terminate model generation
      • For each instance in dataset:
        • If instance classified correctly by model:
        • Multiply weight of instance by e/(1-e)
      • Normalize weight of all instances
  • Classification
    • Assign weight of zero to all classes
    • For each of the t (or less) models:
      • Add –log(e/(1-e)) to weight of class predicted by model
    • Return class with highest weight
stacking
Stacking
  • Bagging and boosting combine models of the same type
    • E.g. a set of decision trees
  • Stacking is applied to models of different types
    • Because different models may not perform comparably well, voting may not work
    • Voting is problematic when two out of three classifiers perform poorly
  • Stacking uses a metalearner to combine different base learners
    • Base learners: level-0 models
    • Meta learner: level-1 model
    • Predictions of base learners fed as inputs to the meta learner
  • Base learner predictions on training data cannot be input to meta learner
    • Instead use cross-validation results on base learner
  • Because classification is done by base learners, meta learners use simple learning schemes
combining models using weka
Combining models using Weka
  • Weka offers methods to perform bagging, boosting and stacking over classifiers
  • In the Explorer, under the classify tab, expand the ‘meta’ section of the hierarchical menu
  • AdaboostM1 (one of the boosting methods) on Iris data classifies only 7 out of 150 incorrectly
  • You are encouraged to try these methods on your own