T picos especiais em aprendizagem
This presentation is the property of its rightful owner.
Sponsored Links
1 / 108

Tópicos Especiais em Aprendizagem PowerPoint PPT Presentation


  • 63 Views
  • Uploaded on
  • Presentation posted in: General

Tópicos Especiais em Aprendizagem. Reinaldo Bianchi Centro Universitário da FEI 2012. 1a. Aula. Parte B. Objetivos desta aula. Apresentar os conceitos básicos de Aprendizado de Máquina : Introdu ção. Definições B ásicas. Áreas de Aplicação. Statistical Machine Learning .

Download Presentation

Tópicos Especiais em Aprendizagem

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


T picos especiais em aprendizagem

Tópicos Especiais em Aprendizagem

Reinaldo Bianchi

Centro Universitário da FEI

2012


1a aula

1a. Aula

Parte B


Objetivos desta aula

Objetivos desta aula

  • Apresentar os conceitos básicos de Aprendizado de Máquina:

    • Introdução.

    • Definições Básicas.

    • Áreas de Aplicação.

  • Statistical Machine Learning.

  • Aula de hoje: Capítulos 1 do Mitchell, 1 do Nilsson e 1 e 2 do Hastie + Wikipedia.


Main approaches according to statistics

MainApproachesaccordingtoStatistics

ExplanationbasedLearning

Decisiontrees

Case BasedLearning

Inductivelearning

BayesianLearning

NearestNeighbors

Neural Networks

Support Vector Machines

GeneticAlgorithms

Regression

Clustering

ReinforcementLearning

Classification

AI StatisticsNeural Network


Main approaches according to statistics1

MainApproachesaccordingtoStatistics

NearestNeighbors

Support Vector Machines

Regression

Clustering

Classification

AI StatisticsNeural Network


Main approaches according to statistics2

MainApproachesaccordingtoStatistics

NearestNeighbors

Regression

Clustering

Classification

AI StatisticsNeural Network


Primeira aula parte b

Primeira aula, parte B

  • Introduction to Statistical Machine Learning:

    • Basic definitions.

    • Regression.

    • Classification.


Livro texto

LivroTexto

  • The Elements of Statistical Learning

  • Data Mining, Inference, and Prediction


Why statistical learning

WhyStatisticalLearning?

  • “Statisticallearning plays a key role in manyareasofscience, financeandindustry.”

  • “Thescienceoflearning plays a key role in thefieldsofstatistics, data miningand artificial intelligence, intersectingwithareasofengineeringandother disciplines.”


Sml problems

SML problems

Predictwhether a patient, hospitalizeddueto a heartattack, willhave a secondheartattack. Thepredictionisto be basedondemographic, dietandclinicalmeasurementsforthatpatient.

Predictthepriceof a stock in 6 monthsfromnow, onthe basis ofcompany performance measuresandeconomic data.


Sml problems1

SML problems

Identifythenumbers in a handwritten ZIP code, from a digitizedimage.

Estimatetheamountofglucose in thebloodof a diabeticperson, fromtheinfraredabsorptionspectrumofthatperson'sblood.

Identifytheriskfactorsforprostatecancer, basedonclinicalanddemographic variables.


Examples of sml problems

Examplesof SML problems

ProstateCancer

StudybyStameyet al. (1989) thatexaminedthecorrelationbetweenthelevelofprostatespecificantigen (PSA) and a numberofclinicalmeasures.

Thegoal is to predictthelogof PSA (lpsa) from a numberofmeasurements.


Examples of supervised learning problems

Examplesofsupervisedlearningproblems


T picos especiais em aprendizagem

Otherexamplesoflearningproblems

DNA Microarrays

Expression matrix of 6830 genes (rows, only 100 shown) and 64 samples (columns) for the human tumor data.

The display is a heat map, ranging from bright green (negative, under expressed) to bright red (positive, over expressed). Missing values are grey.


T picos especiais em aprendizagem

Other examples of learning problems

DNA Microarrays

Expression matrix of 6830 genes (rows, only 100 shown) and 64 samples (columns) for the human tumor data.

The display is a heat map, ranging from bright green (negative, under expressed) to bright red (positive, over expressed). Missing values are grey.


T picos especiais em aprendizagem

Other examples of learning problems

DNA Microarrays

Expression matrix of 6830 genes (rows, only 100 shown) and 64 samples (columns) for the human tumor data.

The display is a heat map, ranging from bright green (negative, under expressed) to bright red (positive, over expressed). Missing values are grey.


T picos especiais em aprendizagem

Other examples of learning problems

DNA Microarrays

Expression matrix of 6830 genes (rows, only 100 shown) and 64 samples (columns) for the human tumor data.

The display is a heat map, ranging from bright green (negative, under expressed) to bright red (positive, over expressed). Missing values are grey.

  • Task: describe how the data are organised or clustered.

  • (unsupervised learning)


Overview of supervised learning

Overview of Supervised Learning

Cap 2 do Hastie


Variable types and terminology

Variable TypesandTerminology

  • In thestatisticalliteraturetheinputsare oftencalledthepredictors, inputs, and more classicallytheindependent variables.

    • In thepatternrecognitionliteraturethetermfeaturesispreferred, whichwe use as well.

  • Theoutputsare calledthe responses, orclassicallythedependent variables.


Variable types and terminology1

Variable TypesandTerminology

  • Theoutputsvary in natureamongtheexamples:

    • ProstateCancerpredictionexample:

      • The output is a quantitativemeasurement.

    • Handwrittendigitexample:

      • The output isoneof 10 differentdigitclasses: G = {0,1,...,9}


Naming convention for the prediction task

Namingconventionforthepredictiontask

  • Thedistinction in output type has ledto a namingconventionforthepredictiontasks:

    • Regressionwhenwepredictquantitativeoutputs.

    • Classificationwhenwepredictqualitativeoutputs.

  • Both can be viewed as a task in functionapproximation.


Examples of sml problems1

Examplesof SML problems

ProstateCancer

StudybyStameyet al. (1989) thatexaminedthecorrelationbetweenthelevelofprostatespecificantigen (PSA) and a numberofclinicalmeasures.

Thegoal is to predictthelogof PSA (lpsa) from a numberofmeasurements.

  • Regressionproblem


Examples of supervised learning problems1

Examplesofsupervisedlearningproblems

  • Classificationproblem


Qualitative variables representation

Qualitative variables representation

  • Qualitative variables are representednumerically by codes:

    • Binary case: iswhenthere are onlytwoclassesorcategories, such as “success” or “failure,” “survived” or “died.”

    • These are oftenrepresented by a single binarydigitorbit as 0 or 1, orelse by −1 and 1.


Qualitative variables representation1

Qualitative variables representation

  • Whenthere are more thantwocategories, Themostcommonlyusedcodingisviadummy variables:

    • K-levelqualitative variable isrepresented by a vector of K binary variables or bits, onlyoneofwhichis “on” at a time.

  • Thesenumericcodes are sometimesreferredto as targets.


Variables

Variables

  • Wewilltypically denote aninput variable by thesymbolX.

    • IfX is a vector, itscomponents can be accessed by subscriptsXj.

    • Observedvalues are written in lowercase: hencetheithobservedvalueofX iswritten as xi

  • Quantitativeoutputswill be denoted by Yandqualitativeoutputswill be denoted by G (forgroup).


Two simple approaches to prediction

Two Simple ApproachestoPrediction:

LeastSquares (método dos mínimos quadrados)

andNearestNeighbors (método dos vizinhosmais próximos)


Linear methods for regression

Linear Methods for Regression

  • “Linear models were largely developed in the pre-computer age of statistics, but even in today’s computer era there are still good reasons to study and use them.” (Hastie et al.)


Linear methods for regression1

Linear Methods for Regression

  • For prediction purposes they can sometimes outperform non-linear models, especially in situations…

    • small sample size

    • low signal-to-noise ratio

    • sparse data

  • Transformation of the inputs


Linear models and least squares

Linear ModelsandLeastSquares

The linear model has been a mainstayofstatisticsforthepast 30 yearsandremainsoneofitsmostimportanttools.

Given a vector ofinputs:

wepredictthe output Y viathemodel:


Linear models

Linear Models

Thetermistheintercept, alsoknown as thebias in machinelearning.

Oftenitisconvenienttoincludetheconstant variable 1 in X, include in the vector ofcoefficients , andthenwritethe linear model in vector form as aninnerproduct:


Positive linear relationship

Positive Linear Relationship

E(y)

Regression line

Intercept

b0

Slope b1

is positive

x


Negative linear relationship

Negative Linear Relationship

E(y)

Regression line

Intercept

b0

Slope b1

is negative

x


T picos especiais em aprendizagem

No Relationship

E(y)

Regression line

Intercept

b0

Slope b1

is 0

x


Fitting the data least squares

Fitting the data: Least Squares

  • How do wefitthe linear modelto a set of training data?

    • by far themost popular isthemethodofleastsquares.

  • Pick thecoefficientsβtominimizetheResidual SumofSquares:


Least squares method

Least Squares Method

  • Least Squares Criterion:

  • where:

    • yi = observed value of the dependent variable for the ith observation

    • yi = estimated value of the dependent variable for the ith observation

^


Fitting the data least squares1

Fitting the data: Least Squares

  • RSS(β) is a quadraticfunctionoftheparameters, andhenceitsminimumalwaysexists, but may not be unique.

  • Thesolutioniseasiesttocharacterize in matrixnotation:

    • whereXisanN × pmatrixwitheachrowaninput vector

    • yisan N-vector oftheoutputs


Fitting the data least squares2

Fitting the data: Least Squares

  • Differentiating

    withrespecttoβweget:


Fitting the data least squares3

Fitting the data: Least Squares

  • AssumingthatX has full columnrank, we set thefirstderivativetozero:

  • IfXTXisnonsingular, thentheuniquesolutionisgiven by:


Example height x shoe size

Example: height x shoe size

  • We wanted to explore the relationship between a person’s height and their shoe size.

    • We asked to individuals their height and corresponding shoe size.

    • We believe that a persons shoe size depends upon their height.

  • The height is independent variable x.

  • Shoe size is the dependent variable, y.


Example height x shoe size1

Example: height x shoe size

The following data was collected:

Height, x (inches) Shoe size, y

Person 1699.5

Person 2678.5

Person 37111.5

Person 46510.5

Person 57211

Person 6687.5

Person 77412

Person 8657

Person 9667.5

Person 10 7213


Example height x shoe size2

Example: height x shoe size


Least squares method forma matricial

Least Squares Method(forma matricial)

Theuniquesolutionisgiven by:

Oftenitisconvenienttoincludetheconstant variable 1 in X, include in the vector ofcoefficients


X without bias 0

X without Bias β0


X with bias 0

X with Bias β0


T picos especiais em aprendizagem

XT


X t x

XTX


X t x1

XTX

n


X t y

XTy


X t y1

XTy


Example height x shoe size3

Example: height x shoe size

Height, x Shoe size, yx2xy

699.54761655.5

678.54489569.5

7111.55041816.5

6510.54225682.5

72115184792

687.54624510

74125476888

6574225455

667.54356495

72135184936

68998475656800


Scatter plot

Scatter Plot


Scatter plot with trend line

Scatter Plot with Trend Line


Scatter plot with trend line1

Scatter Plot with Trend Line


Linear models and least squares regression

Linear ModelsandLeastSquares: Regression

Using the learned parameters βone can do compute new outputs via regression.

At anarbitraryinputx0thepredictionis:

Intuitively, itseemsthatwe do notneed a verylarge data set tofitsuch a model.


Example height x shoe size4

Example Height x Shoe Size

  • Thus if a person is 5 feet tall (i.e. x=60 inches), then I would estimate their shoe size to be:


Regression using lms

Regression using LMS


Two simple approaches to prediction1

Two Simple ApproachestoPrediction:

LeastSquares (método dos mínimos quadrados)

andNearestNeighbors (método dos vizinhosmais próximos)


Nearest neighbor method

Nearest-NeighborMethod

  • Nearest-neighbormethods use thoseobservations in the training set T closest in inputspacetoxtoform .

  • Specifically, thek-nearestneighborfitforisdefined as follows:

    whereNk(x)istheneighborhoodofx.


Nearest neighbor method1

Nearest-NeighborMethod

  • Nk(x) istheneighborhoodofxdefined by thekclosestpointstoxi in the training sample.

  • In words: wefindthekobservationswithxiclosesttox in inputspace, and average their responses.

  • Closenessimplies a metric:

    • weassumeisEuclideandistance


Nearest neighbor method2

Nearest-NeighborMethod

  • Nearest-NeighborMethod can be usedforregressionorforclassification.

  • Regression: just compute yfrom a newx:

  • Classification: compute y andtheclassifyit.


Example

Example

Usingthesame data fromtheheightxshoesizeproblem.

Regression: just compute y fromx = 70.

Use kNN, with k = 5.


Example height x shoe size5

Example: height x shoe size

The following data was collected:

Height, x (inches) Shoe size, y

Person 1699.5

Person 2678.5

Person 37111.5

Person 46510.5

Person 57211

Person 6687.5

Person 77412

Person 8657

Person 9667.5

Person 10 7213


Example height x shoe size6

Example: height x shoe size

The following data was collected:

Height, x (inches) Shoe size, y

Person 1699.5

Person 2678.5

Person 37111.5

Person 46510.5

Person 57211

Person 6687.5

Person 77412

Person 8657

Person 9667.5

Person 10 7213


Example height x shoe size7

Example: height x shoe size

The following data was collected:

Height, x (inches) Shoe size, y

Person 1699.5

Person 2678.5

Person 37111.5

Person 46510.5

Person 57211

Person 6687.5

Person 77412

Person 8657

Person 9667.5

Person 10 7213


Example1

Example

Regression: just compute y fromx = 70.

Use kNN, with k = 5.


Regression using 5 nn

Regression using 5-NN


Regression using 5 nn1

Regression using 5-NN


Two simple approaches to classification

Two Simple ApproachestoClassification

We can use bothmethodsforclassification:

- LeastSquaresand

- NearestNeighbors


Least squares for classification

Least Squares for Classification

  • Theentirefittedsurfaceischaracterized by theparametersβ.

  • We can use thismethodforclassificationproposes:

    • Giventhe training data, fitit in a model.

    • Theresultinglineisused as a separationboundary.


Classification example

Classification Example

Figure 2.1 shows a scatterplotof training data on a pairofinputs X1and X2.

The data are simulated.

The output class variable G has thevalues BLUE or ORANGE

The linear regressionmodelwasfittothese data, withthe response Y coded as 0 for BLUE and 1 for ORANGE.


Linear classification

Linear Classification


Classification example1

Classification Example

  • Thelineisthedecisionboundarydefined by:

  • The orange shadedregion denotes thatpartofinputspaceclassified as ORANGE, whilethe blue regionisclassified as BLUE.

  • There are severalmisclassificationsonbothsidesofthedecisionboundary.


Nearest neighbor as classifier

Nearest-Neighbor as classifier

Figure 2.2 use thesame training data as in Figure 2.1, and use 15-nearest-neighboraveragingofthebinarycoded response as themethodoffitting.

istheproportionofORANGE’s in theneighborhood, and so assigningclass ORANGE if > 0.5 amountsto a majority vote in theneighborhood.


15 nearest neighbor classifier

15 nearest-neighborclassifier


Nearest neighbor as classifier1

Nearest-Neighbor as classifier

Weseethatthedecisionboundariesthatseparatethe BLUE fromthe ORANGE regions are far more irregular, andrespondto local clusters whereoneclassdominates.

In Figure 2.2 weseethat far fewer training observations are misclassifiedthan in Figure 2.1.


1 nearest neighbor classifier

1 nearest-neighborclassifier


Nearest neighbor as classifier2

Nearest-Neighbor as classifier

Figure 2.3 noneofthe training data are misclassified…

Fork-nearest-neighborfits, the error onthe training data should be approximatelyanincreasingfunctionofk, andwillalways be 0 fork = 1.

k-nearest-neighborhave a single parameter, thenumberofneighborsk.


From least squares to nearest neighbors

FromLeastSquarestoNearestNeighbors

LeastSquares:

The linear decisionboundaryfromleastsquaresisverysmooth, andap- parentlystabletofit.

Itdoesappeartorelyheavilyontheassumptionthat a linear decisionboundaryisappropriate.


From least squares to nearest neighbors1

FromLeastSquarestoNearestNeighbors

k-NN:

do notrelyonanystringentassumptionsabouttheunderlying data, and can adapttoanysituation.

Any particular subregionofthedecisionboundarydependson a handfulofinputpointsandtheir particular positions, andisthuswigglyandunstable.


Linear regression x knn

Linear regression xkNN

Linear regressionis more appropriatewheneachclassisgeneratedfrombivariateGaussiandistributionswithuncorrelatedcomponentsanddifferentmeans.

Nearestneighbors are more suitablewhen training data in eachclasscamefrom a mixture ofGaussiandistributions.


Como fica no matlab

Como fica no Matlab?


Least squares with matlab

LeastSquareswithMatlab

Curve FittingToolbox software uses the linear least-squaresmethodtofit a linear modelto data.

Fittingrequires a parametricmodelthat relates the response data tothepredictor data withoneor more coefficients.

Theresultofthefittingprocessisanestimateofthemodelcoefficients.


Least squares with matlab1

LeastSquareswithMatlab

  • Use the MATLAB backslashoperator (mldivide) tosolve a systemofsimultaneous linear equationsforunknowncoefficients:

  • BecauseinvertingXTX can leadtoroundingerrors, Matlab uses QR decompositionwithpivoting, whichis a verystablealgorithmnumerically.


Least squares with matlab2

LeastSquareswithMatlab

  • In matrixform, linear models are given by the formula:

    • y = Xβ + ε

  • Where:

    • yisann-by-1 vector of responses.

    • βis a m-by-1 vector ofcoefficients.

    • Xisthen-by-mdesignmatrixforthemodel.

    • εisann-by-1 vector oferrors.


Example lms with matlab

Example: LMS withMatlab

Nota: ' significa transposta no matlab

  • X = [1,2,3,4,5]’

  • y = [2,4,6,8,10]’

  • beta = (X'*X)\(X'*y)

    beta =

    2


Plotting the scatter plot in matlab

Plotting the scatter plot in Matlab

a = 0;0.1;5

plot (x,y,’o',a, a*beta)

axis ([0,5,0,12])


Example lms with matlab1

Example: LMS withMatlab


Matlab doing regression

Matlab–doingRegression

Using the learned parameters βone can do compute new outputs via regression.

At anarbitraryinputx0thepredictionis:

>> ynovo = beta * 10

ynovo =

20


Adding the linear bias 0

Addingthe linear Bias β0

THIS ADDS ONE

COLUMN WITH “1”

  • x = [1,2,3,4,5]’

  • y = [2,4,6,8,10]'

  • one = ones(5,1)

  • X = [one, x]

  • v = (X'*X)\(X'*y)

    v =

    0

    2


Linear with bias 0

Linear with Bias β0

THIS ADDS ONE

COLUMN WITH “1”

  • x = [1,2,3,4,5]’

  • y = [3,5,7,9,11]'

  • one = ones(5,1)

  • X = [one, x]

  • v = (X'*X)\(X'*y)

    v =

    1

    2


Linear without bias 0

Linear WITHOUT Bias β0


Linear with bias 01

Linear with Bias β0


Lms with 2 variables

LMS with 2 variables

  • Datasetcontains 3 variables, books, attendand grade and has 40 cases.

    • Books representsthenumberofbooksread by studentson a statisticscourse,

    • attendrepresentsthenumberoflecturestheyattendedand

    • grade representstheir final grade onthecourse. 


Dataset books attend and grade

Datasetbooks, attendand grade


Lms with 2 variables1

LMS with 2 variables


Lms with 2 variables2

LMS with 2 variables

load Books_attend_grade.dat

x1 = Books_attend_grade(:,1)

x2 = Books_attend_grade(:,2)

y = Books_attend_grade(:,3)

one = ones(40,1)

X = [one, x1,x2]


Lms with 2 variables3

LMS with 2 variables

>> v = (X'*X)\(X'*y)

v =

37.3792

4.0369

1.2835


Lms with 2 variables4

LMS with 2 variables

[x1,x2]=meshgrid(0:1:4,0:1:20)

surf(x1,x2,37.3792+4.0369*x1+1.2835*x2)


Linear least square fitting ir 2

Linear Least Square Fitting IR2


Conclus o

Conclusão

  • Vimosoqueé Machine Learning.

  • Começamos a veroqueé Statistical Machine Learning:

    • Métodosmatemáticos (estatísticos) de previsão – regressãoeclassificação.

    • Linear regression and classification.

    • K-Nearest Neighbor


Exerc cio boiling point at the alps

Exercício: Boiling point at the Alps

  • Description: Theboilingpointofwater at differentbarometricpressures.

  • There are 17 observations.

  • Variables:

    • BPt: therecordedboilingpointofwater in degrees F

    • Pressure: thebarometricpressure in inchesofmercury.


Exerc cio boiling point at the alps1

Exercício: Boiling point at the Alps

Use Leastsquares e calcule o beta.

Qual o valor de pressão para a temperatura de 200 F?


Exerc cio boiling point at the alps2

Exercício: Boiling point at the Alps

Use 5-NN e calcule o mesmo valor.

Qual o valor de pressão para a temperatura de 200 F?


Pr xima aula

Próxima aula:

  • Statistical Machine Learning:

    • Métodos de validaçãoeseleção.

    • MaisMatlab.

    • PCA.

    • Métodosmais fortes ematemáticos.


T picos especiais em aprendizagem

Fim


  • Login