1 / 32

Learning From Data

Learning From Data. Chichang Jou Tamkang University. Chapter Objectives. Analyze the general model of inductive learning Explain how to select an approximating function Introduce risk functional for regression and classification problems Identify concepts in statistical learning theory

aurek
Download Presentation

Learning From Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning From Data Chichang Jou Tamkang University

  2. Chapter Objectives • Analyze the general model of inductive learning • Explain how to select an approximating function • Introduce risk functional for regression and classification problems • Identify concepts in statistical learning theory • Discuss the differences of inductive principles, empirical risk minimization, and structural risk minimization • Discuss practical aspects of VC dimension • Compare inductive learning tasks using graphics • Introduce validation methods of inductive learning results

  3. Background • Biological systems learn to cope with the unknown, statistical environment in a data-driven fashion • Two-phases of predictive-learning process: • Learning or estimating unknown dependencies • Induction: progressing from particular cases to a model • Using estimated dependencies to predict • Deduction: progressing from a model and given input to particular cases

  4. Induction, Deduction, Transduction Local estimation, like association rules

  5. 4.1 Learning machine • Machine learning algorithms vary in their goals, in the available training data sets, and in the learning strategies and representation of data • Inductive machine learning • A generalization of models is obtained from a set of samples

  6. Observational setting of a Learning machine Real-worlds systems often have un-measured inputs Conditional probability p(Y/X)

  7. Inductive Learning machine • Try to form generalizations from particular true facts (called training data set). • Formalized as a set of functions that approximate a system’s behavior • Given X as an input, implementing a set of functions f(X, w), w is a parameter of the function • Its solution requires a priori knowledge

  8. Inductive Learning machine • The task of inductive inference • Given samples (xi, f(xi)), return a function h(x), called hypothesis, that approximate f(x) linear non-linear

  9. Inductive Learning machine • Statistical dependency vs. causality • Inductive-learning processes build the model of dependencies, but they should not be automatically interpreted as causality relations • Example: people in Florida are on average older than in other states. Married mnn live longer than single men.

  10. Loss function and Risk function • L(y, f(X,w)) • Measures the difference between y and f(X,w) • Induction learning is the process of estimating f(X,wopt), which minimizes R(w)

  11. Common Loss function

  12. Inductive principle • An inductive principle is a general prescription (what to do with the data) for obtaining an estimate f(X, wopt*) • Human intervention inthe learning algorithm • Selection of input and output variables • Data encoding and representation • Incorporating a priori knowledge • Influence over the generator of the sampling rate or distribution

  13. 4.2 Statistical Learning Method • A formalized theory for finite-sample inductive learning, mainly for classification or pattern recognition • Provide quantitative description of the trade-off between model complexity and the available information • Also called VC (Vapnik-Chervonenkis) theory • Other approaches are more engineering-oriented, without proofs and formalizations

  14. Empirical risk minimization (ERM) Typically used when the model is given or approximated first, and then its parameters are estimated from the data

  15. Empirical risk minimization (ERM) • The consistency property • Minimizing one risk for a given data set will also minimize the other risk • Nontrivial Consistency • Consistency requirement must hold for all approximating functions

  16. Behavior of the Growth function G(n) Approximating functions in the form of G(n) will have a consistency property

  17. Structural Risk Minimization (SRM) • ERM is good when n/h is large • When n/h < 20, use SRM • Selecting an element of a structure having optimal complexity • Estimating the model based on the set of approximating functions defined in the selected element of the structure

  18. SRM in practice

  19. SRM • Applications of SRM for non-linear approximations are difficult, impossible in many cases • use heuristics, like early stopping rules and weight initialization • Three optimization approaches • Stochastic approximation (gradient descent) • Iterative methods • Greedy optimization

  20. SRM • Problems with the optimization approaches • Too sensitive to initial conditions • Too sensitive to stopping rules • Too sensitive to many local minima • Two useful guidelines • Do not attempt to solve a problem by indirectly solving a harder general problem • Occam’s razor: The best performance is provided by a model of optimal complexity

  21. Requirement of any inductive-learning process

  22. Types of Learning Methods Examples: logistic regression, multilayered perception, decision rules, decision trees, etc. Emphasis on a task-independent measure of quality of representation. Examples: cluster analysis, artificial neural network, association rules

  23. Common Learning Tasks • Classification • Regression • Clustering • Summarization (Formalized Description) • Dependency-modeling • Deviation Detection (Outlier, Changes in time)

  24. Data-mining and Knowledge-discovery techniques • Statistical Methods • Cluster Analysis • Decision Trees and Decision Rules • Association Rules • Artificial Neural Network • Genetic Algorithms • Fuzzy Inference Systems • N-dimensional Visualization Methods

  25. 4.5 Model Estimation

  26. Testing

  27. Objective of Testing

  28. How to Split Samples

  29. Common Resampling Methods • Resubstitution Method • Holdout Method • Leave-one-out Method • Rotation Method • Bootstrap Method

  30. Error rate, Accuracy • R= E / S • A = 1 – R = (S – E) / S • Two classes • False Negative: False Reject Rate (FRR) • False Positive: False Acceptance Rate (FAR) • More than two classes • Confusion matrix

  31. Confusion matrix for three classes

  32. Receiver Operating Characteristic (ROC) Curve • To evaluate FAR and FRR at the same time • The following ROC shows sensitivity (FAR) vs. 1-specificity (1-FRR) FAR

More Related