- By
**alice** - Follow User

- 478 Views
- Updated on

Download Presentation
## Machine Learning

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Machine Learning

Part I:

Classification and Bayesian Learning

Ref: E. Alpaydin, Intro to Machine Learning, MIT 2004

Machine Learning

- Machine Leaning is programming computers to optimize a perf criteria using example data or past experience
- Inference from samples
- There is a process that explains the data we observe. But we don’t know the details about how the data are generated.
- Internet requests, failure events, etc
- It’s hard to identify (model) the process completely, we could construct a good and useful approximation that detect certain patterns. Such patterns would help us to understand the process and make predictions about the future.

Types of Machine Learning

- Supervised learning is to create a function from training data. The training data consist of pairs of input objects (typically vectors), and desired outputs.
- Classification: Given an input, the output is Boolean (yes/no) to predict a class label of the input object;
- Regression: If the label is a numerical value, learn the function f(x) that best explain the input instance;
- Unsupervised learning: manual labels of inputs are not used.
- Clustering: partition a data set into subsets (clusters), so that the data in each subset share some common trait
- Semi-supervised learning: make use of both labeled and unlabeled data for training
- Reinforcement Learning
- Learning a policy: A sequence of outputs; No supervised output but delayed reward
- Examples: game playing, robot navigation

Supervised Learning

- Use of Supervised Learning
- Classification
- Regression
- Evaluation Methodology
- Bayesian Learning for Classification

Why Supervised Learning?

- Prediction of future cases:Use the rule to predict the output for future inputs
- Knowledge extraction:The rule is easy to understand
- Compression:The rule is simpler than the data it explains
- Outlier detection: Exceptions that are not covered by the rule, e.g., fraud

E.g: Credit scoring

Differentiating betweenlow-riskand high-risk customers from their income and savings

Rule-based prediction

ClassificationDiscriminant: IF income > θ1 AND savings > θ2

THEN low-riskELSE high-risk

Learning a Class from Examples

- Given a set of examples of cars, with a label of “family car” or not according to a survey, class learning is to find a description that is shared by all positive examples.
- Use of the class info
- Prediction: Is car x a family car?
- Knowledge extraction: What do people expect from a family car?

Hypothesis Class: C

Most specific hypothesis, S

Most general hypothesis, G

Learning is to find a particular

hypothesis h to approximate C

Hypothesis h and Empirical Error

Error of h:

Model Selection & Generalization

- Learning is an ill-posed problem: data is not sufficient to find a unique solution
- Limited number of sample data
- Some data might be noise due to imprecision in recording, labeling, or hidden (latent, unobservable) attributes that affect the label of instances
- The need for inductive bias: assumptions about class structureH
- Why rectangle, not circle or irregular shape?
- What’s degree of tightness of fitting?
- Generalization:How well a model performs on new data

Noise and Model Complexity

Simple model is preferred

- Easy to use (check)

(lower time complexity)

- Easy to train (lower

space complexity)

- Easyto explain

(more interpretable)

- Easy to generalize (lower

variance )

Noise: any anomaly in the data

which leads it infeasible to reach

a zero-error classification

with a simple hypothesis class

Probably Approximately Correct (PAC) Learning

- How many training examples N should we have, suchthat with probability at least 1 ‒ δ, h has error at most ε ?
- Each strip is at most ε/4
- Pr that we miss a strip 1‒ ε/4
- Pr that N instances miss a strip (1 ‒ ε/4)N
- Pr that N instances miss 4 strips 4(1 ‒ ε/4)N
- 4(1 ‒ ε/4)N ≤ δ and (1 ‒ x)≤exp( ‒ x)
- 4exp(‒ εN/4) ≤ δ and N ≥ (4/ε)log(4/δ)

2-Class vs K-Class

K-class problem be

viewed as K 2-class problem:

Train hypotheses

hi(x), i =1,...,K:

Regression

- Examples
- Price of a used car
- Speed of Top500
- x : car attributes

y : price

y = g (x | θ)

g ( ) model,

θ parameters

Linear regression

- y = wx+w0

Basic Concepts

- Interpolation
- Find a function that best fits a training set with no presence of noise
- r = f(x)
- Extrapolation
- Predict the output for any x, if x is NOT in the training set
- Regression
- Noise factor must be considered
- r = f(x) + OR there’re hidden variables we couldn’t observe: r = f(x, z)

Regression

For a given test set, find g() that minimizes the empirical error

Underfitting vs Overfitting

- Underfitting: Hypothesis (H) less complex than actual model (C)
- Using a line to fit data sampled from a 3rd order polynomial
- Accuracy increases with more sample data; may not enough if the hypothesis is too complex
- Overfitting: H more complex than C
- Having more training data helps but only up to a certain point

Triple Trade-Off

Trade-off between three factors :

- Complexity of the hypothesisH, c (H): capacity of the hypothesis class
- Training set size, N,
- Generalization error, E, on new examples
- As N, E¯
- As c (H), first E¯ and then E(The error of an over-complex hypothesis can be kept in check by increasing the amount of training data, but only up to a point)

Cross-Validation

- To estimate generalization error, we need data unseen during training.
- Three types of data in cross-validation:
- Training set (50%)
- Validation set (25%)
- Test (publication) set (25%)
- Resampling when there is few data

Dimensions of a Supervised Learner: Summary

- Model g() and parameter
- Loss function L(): diff between desired output and approximation
- Optimization procedure:

return the argument that minimizes

Download Presentation

Connecting to Server..