T picos especiais em aprendizagem
Sponsored Links
This presentation is the property of its rightful owner.
1 / 108

Tópicos Especiais em Aprendizagem PowerPoint PPT Presentation


  • 72 Views
  • Uploaded on
  • Presentation posted in: General

Tópicos Especiais em Aprendizagem. Reinaldo Bianchi Centro Universitário da FEI 2012. 1a. Aula. Parte B. Objetivos desta aula. Apresentar os conceitos básicos de Aprendizado de Máquina : Introdu ção. Definições B ásicas. Áreas de Aplicação. Statistical Machine Learning .

Download Presentation

Tópicos Especiais em Aprendizagem

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Tópicos Especiais em Aprendizagem

Reinaldo Bianchi

Centro Universitário da FEI

2012


1a. Aula

Parte B


Objetivos desta aula

  • Apresentar os conceitos básicos de Aprendizado de Máquina:

    • Introdução.

    • Definições Básicas.

    • Áreas de Aplicação.

  • Statistical Machine Learning.

  • Aula de hoje: Capítulos 1 do Mitchell, 1 do Nilsson e 1 e 2 do Hastie + Wikipedia.


MainApproachesaccordingtoStatistics

ExplanationbasedLearning

Decisiontrees

Case BasedLearning

Inductivelearning

BayesianLearning

NearestNeighbors

Neural Networks

Support Vector Machines

GeneticAlgorithms

Regression

Clustering

ReinforcementLearning

Classification

AI StatisticsNeural Network


MainApproachesaccordingtoStatistics

NearestNeighbors

Support Vector Machines

Regression

Clustering

Classification

AI StatisticsNeural Network


MainApproachesaccordingtoStatistics

NearestNeighbors

Regression

Clustering

Classification

AI StatisticsNeural Network


Primeira aula, parte B

  • Introduction to Statistical Machine Learning:

    • Basic definitions.

    • Regression.

    • Classification.


LivroTexto

  • The Elements of Statistical Learning

  • Data Mining, Inference, and Prediction


WhyStatisticalLearning?

  • “Statisticallearning plays a key role in manyareasofscience, financeandindustry.”

  • “Thescienceoflearning plays a key role in thefieldsofstatistics, data miningand artificial intelligence, intersectingwithareasofengineeringandother disciplines.”


SML problems

Predictwhether a patient, hospitalizeddueto a heartattack, willhave a secondheartattack. Thepredictionisto be basedondemographic, dietandclinicalmeasurementsforthatpatient.

Predictthepriceof a stock in 6 monthsfromnow, onthe basis ofcompany performance measuresandeconomic data.


SML problems

Identifythenumbers in a handwritten ZIP code, from a digitizedimage.

Estimatetheamountofglucose in thebloodof a diabeticperson, fromtheinfraredabsorptionspectrumofthatperson'sblood.

Identifytheriskfactorsforprostatecancer, basedonclinicalanddemographic variables.


Examplesof SML problems

ProstateCancer

StudybyStameyet al. (1989) thatexaminedthecorrelationbetweenthelevelofprostatespecificantigen (PSA) and a numberofclinicalmeasures.

Thegoal is to predictthelogof PSA (lpsa) from a numberofmeasurements.


Examplesofsupervisedlearningproblems


Otherexamplesoflearningproblems

DNA Microarrays

Expression matrix of 6830 genes (rows, only 100 shown) and 64 samples (columns) for the human tumor data.

The display is a heat map, ranging from bright green (negative, under expressed) to bright red (positive, over expressed). Missing values are grey.


Other examples of learning problems

DNA Microarrays

Expression matrix of 6830 genes (rows, only 100 shown) and 64 samples (columns) for the human tumor data.

The display is a heat map, ranging from bright green (negative, under expressed) to bright red (positive, over expressed). Missing values are grey.


Other examples of learning problems

DNA Microarrays

Expression matrix of 6830 genes (rows, only 100 shown) and 64 samples (columns) for the human tumor data.

The display is a heat map, ranging from bright green (negative, under expressed) to bright red (positive, over expressed). Missing values are grey.


Other examples of learning problems

DNA Microarrays

Expression matrix of 6830 genes (rows, only 100 shown) and 64 samples (columns) for the human tumor data.

The display is a heat map, ranging from bright green (negative, under expressed) to bright red (positive, over expressed). Missing values are grey.

  • Task: describe how the data are organised or clustered.

  • (unsupervised learning)


Overview of Supervised Learning

Cap 2 do Hastie


Variable TypesandTerminology

  • In thestatisticalliteraturetheinputsare oftencalledthepredictors, inputs, and more classicallytheindependent variables.

    • In thepatternrecognitionliteraturethetermfeaturesispreferred, whichwe use as well.

  • Theoutputsare calledthe responses, orclassicallythedependent variables.


Variable TypesandTerminology

  • Theoutputsvary in natureamongtheexamples:

    • ProstateCancerpredictionexample:

      • The output is a quantitativemeasurement.

    • Handwrittendigitexample:

      • The output isoneof 10 differentdigitclasses: G = {0,1,...,9}


Namingconventionforthepredictiontask

  • Thedistinction in output type has ledto a namingconventionforthepredictiontasks:

    • Regressionwhenwepredictquantitativeoutputs.

    • Classificationwhenwepredictqualitativeoutputs.

  • Both can be viewed as a task in functionapproximation.


Examplesof SML problems

ProstateCancer

StudybyStameyet al. (1989) thatexaminedthecorrelationbetweenthelevelofprostatespecificantigen (PSA) and a numberofclinicalmeasures.

Thegoal is to predictthelogof PSA (lpsa) from a numberofmeasurements.

  • Regressionproblem


Examplesofsupervisedlearningproblems

  • Classificationproblem


Qualitative variables representation

  • Qualitative variables are representednumerically by codes:

    • Binary case: iswhenthere are onlytwoclassesorcategories, such as “success” or “failure,” “survived” or “died.”

    • These are oftenrepresented by a single binarydigitorbit as 0 or 1, orelse by −1 and 1.


Qualitative variables representation

  • Whenthere are more thantwocategories, Themostcommonlyusedcodingisviadummy variables:

    • K-levelqualitative variable isrepresented by a vector of K binary variables or bits, onlyoneofwhichis “on” at a time.

  • Thesenumericcodes are sometimesreferredto as targets.


Variables

  • Wewilltypically denote aninput variable by thesymbolX.

    • IfX is a vector, itscomponents can be accessed by subscriptsXj.

    • Observedvalues are written in lowercase: hencetheithobservedvalueofX iswritten as xi

  • Quantitativeoutputswill be denoted by Yandqualitativeoutputswill be denoted by G (forgroup).


Two Simple ApproachestoPrediction:

LeastSquares (método dos mínimos quadrados)

andNearestNeighbors (método dos vizinhosmais próximos)


Linear Methods for Regression

  • “Linear models were largely developed in the pre-computer age of statistics, but even in today’s computer era there are still good reasons to study and use them.” (Hastie et al.)


Linear Methods for Regression

  • For prediction purposes they can sometimes outperform non-linear models, especially in situations…

    • small sample size

    • low signal-to-noise ratio

    • sparse data

  • Transformation of the inputs


Linear ModelsandLeastSquares

The linear model has been a mainstayofstatisticsforthepast 30 yearsandremainsoneofitsmostimportanttools.

Given a vector ofinputs:

wepredictthe output Y viathemodel:


Linear Models

Thetermistheintercept, alsoknown as thebias in machinelearning.

Oftenitisconvenienttoincludetheconstant variable 1 in X, include in the vector ofcoefficients , andthenwritethe linear model in vector form as aninnerproduct:


Positive Linear Relationship

E(y)

Regression line

Intercept

b0

Slope b1

is positive

x


Negative Linear Relationship

E(y)

Regression line

Intercept

b0

Slope b1

is negative

x


No Relationship

E(y)

Regression line

Intercept

b0

Slope b1

is 0

x


Fitting the data: Least Squares

  • How do wefitthe linear modelto a set of training data?

    • by far themost popular isthemethodofleastsquares.

  • Pick thecoefficientsβtominimizetheResidual SumofSquares:


Least Squares Method

  • Least Squares Criterion:

  • where:

    • yi = observed value of the dependent variable for the ith observation

    • yi = estimated value of the dependent variable for the ith observation

^


Fitting the data: Least Squares

  • RSS(β) is a quadraticfunctionoftheparameters, andhenceitsminimumalwaysexists, but may not be unique.

  • Thesolutioniseasiesttocharacterize in matrixnotation:

    • whereXisanN × pmatrixwitheachrowaninput vector

    • yisan N-vector oftheoutputs


Fitting the data: Least Squares

  • Differentiating

    withrespecttoβweget:


Fitting the data: Least Squares

  • AssumingthatX has full columnrank, we set thefirstderivativetozero:

  • IfXTXisnonsingular, thentheuniquesolutionisgiven by:


Example: height x shoe size

  • We wanted to explore the relationship between a person’s height and their shoe size.

    • We asked to individuals their height and corresponding shoe size.

    • We believe that a persons shoe size depends upon their height.

  • The height is independent variable x.

  • Shoe size is the dependent variable, y.


Example: height x shoe size

The following data was collected:

Height, x (inches) Shoe size, y

Person 1699.5

Person 2678.5

Person 37111.5

Person 46510.5

Person 57211

Person 6687.5

Person 77412

Person 8657

Person 9667.5

Person 10 7213


Example: height x shoe size


Least Squares Method(forma matricial)

Theuniquesolutionisgiven by:

Oftenitisconvenienttoincludetheconstant variable 1 in X, include in the vector ofcoefficients


X without Bias β0


X with Bias β0


XT


XTX


XTX

n


XTy


XTy


Example: height x shoe size

Height, x Shoe size, yx2xy

699.54761655.5

678.54489569.5

7111.55041816.5

6510.54225682.5

72115184792

687.54624510

74125476888

6574225455

667.54356495

72135184936

68998475656800


Scatter Plot


Scatter Plot with Trend Line


Scatter Plot with Trend Line


Linear ModelsandLeastSquares: Regression

Using the learned parameters βone can do compute new outputs via regression.

At anarbitraryinputx0thepredictionis:

Intuitively, itseemsthatwe do notneed a verylarge data set tofitsuch a model.


Example Height x Shoe Size

  • Thus if a person is 5 feet tall (i.e. x=60 inches), then I would estimate their shoe size to be:


Regression using LMS


Two Simple ApproachestoPrediction:

LeastSquares (método dos mínimos quadrados)

andNearestNeighbors (método dos vizinhosmais próximos)


Nearest-NeighborMethod

  • Nearest-neighbormethods use thoseobservations in the training set T closest in inputspacetoxtoform .

  • Specifically, thek-nearestneighborfitforisdefined as follows:

    whereNk(x)istheneighborhoodofx.


Nearest-NeighborMethod

  • Nk(x) istheneighborhoodofxdefined by thekclosestpointstoxi in the training sample.

  • In words: wefindthekobservationswithxiclosesttox in inputspace, and average their responses.

  • Closenessimplies a metric:

    • weassumeisEuclideandistance


Nearest-NeighborMethod

  • Nearest-NeighborMethod can be usedforregressionorforclassification.

  • Regression: just compute yfrom a newx:

  • Classification: compute y andtheclassifyit.


Example

Usingthesame data fromtheheightxshoesizeproblem.

Regression: just compute y fromx = 70.

Use kNN, with k = 5.


Example: height x shoe size

The following data was collected:

Height, x (inches) Shoe size, y

Person 1699.5

Person 2678.5

Person 37111.5

Person 46510.5

Person 57211

Person 6687.5

Person 77412

Person 8657

Person 9667.5

Person 10 7213


Example: height x shoe size

The following data was collected:

Height, x (inches) Shoe size, y

Person 1699.5

Person 2678.5

Person 37111.5

Person 46510.5

Person 57211

Person 6687.5

Person 77412

Person 8657

Person 9667.5

Person 10 7213


Example: height x shoe size

The following data was collected:

Height, x (inches) Shoe size, y

Person 1699.5

Person 2678.5

Person 37111.5

Person 46510.5

Person 57211

Person 6687.5

Person 77412

Person 8657

Person 9667.5

Person 10 7213


Example

Regression: just compute y fromx = 70.

Use kNN, with k = 5.


Regression using 5-NN


Regression using 5-NN


Two Simple ApproachestoClassification

We can use bothmethodsforclassification:

- LeastSquaresand

- NearestNeighbors


Least Squares for Classification

  • Theentirefittedsurfaceischaracterized by theparametersβ.

  • We can use thismethodforclassificationproposes:

    • Giventhe training data, fitit in a model.

    • Theresultinglineisused as a separationboundary.


Classification Example

Figure 2.1 shows a scatterplotof training data on a pairofinputs X1and X2.

The data are simulated.

The output class variable G has thevalues BLUE or ORANGE

The linear regressionmodelwasfittothese data, withthe response Y coded as 0 for BLUE and 1 for ORANGE.


Linear Classification


Classification Example

  • Thelineisthedecisionboundarydefined by:

  • The orange shadedregion denotes thatpartofinputspaceclassified as ORANGE, whilethe blue regionisclassified as BLUE.

  • There are severalmisclassificationsonbothsidesofthedecisionboundary.


Nearest-Neighbor as classifier

Figure 2.2 use thesame training data as in Figure 2.1, and use 15-nearest-neighboraveragingofthebinarycoded response as themethodoffitting.

istheproportionofORANGE’s in theneighborhood, and so assigningclass ORANGE if > 0.5 amountsto a majority vote in theneighborhood.


15 nearest-neighborclassifier


Nearest-Neighbor as classifier

Weseethatthedecisionboundariesthatseparatethe BLUE fromthe ORANGE regions are far more irregular, andrespondto local clusters whereoneclassdominates.

In Figure 2.2 weseethat far fewer training observations are misclassifiedthan in Figure 2.1.


1 nearest-neighborclassifier


Nearest-Neighbor as classifier

Figure 2.3 noneofthe training data are misclassified…

Fork-nearest-neighborfits, the error onthe training data should be approximatelyanincreasingfunctionofk, andwillalways be 0 fork = 1.

k-nearest-neighborhave a single parameter, thenumberofneighborsk.


FromLeastSquarestoNearestNeighbors

LeastSquares:

The linear decisionboundaryfromleastsquaresisverysmooth, andap- parentlystabletofit.

Itdoesappeartorelyheavilyontheassumptionthat a linear decisionboundaryisappropriate.


FromLeastSquarestoNearestNeighbors

k-NN:

do notrelyonanystringentassumptionsabouttheunderlying data, and can adapttoanysituation.

Any particular subregionofthedecisionboundarydependson a handfulofinputpointsandtheir particular positions, andisthuswigglyandunstable.


Linear regression xkNN

Linear regressionis more appropriatewheneachclassisgeneratedfrombivariateGaussiandistributionswithuncorrelatedcomponentsanddifferentmeans.

Nearestneighbors are more suitablewhen training data in eachclasscamefrom a mixture ofGaussiandistributions.


Como fica no Matlab?


LeastSquareswithMatlab

Curve FittingToolbox software uses the linear least-squaresmethodtofit a linear modelto data.

Fittingrequires a parametricmodelthat relates the response data tothepredictor data withoneor more coefficients.

Theresultofthefittingprocessisanestimateofthemodelcoefficients.


LeastSquareswithMatlab

  • Use the MATLAB backslashoperator (mldivide) tosolve a systemofsimultaneous linear equationsforunknowncoefficients:

  • BecauseinvertingXTX can leadtoroundingerrors, Matlab uses QR decompositionwithpivoting, whichis a verystablealgorithmnumerically.


LeastSquareswithMatlab

  • In matrixform, linear models are given by the formula:

    • y = Xβ + ε

  • Where:

    • yisann-by-1 vector of responses.

    • βis a m-by-1 vector ofcoefficients.

    • Xisthen-by-mdesignmatrixforthemodel.

    • εisann-by-1 vector oferrors.


Example: LMS withMatlab

Nota: ' significa transposta no matlab

  • X = [1,2,3,4,5]’

  • y = [2,4,6,8,10]’

  • beta = (X'*X)\(X'*y)

    beta =

    2


Plotting the scatter plot in Matlab

a = 0;0.1;5

plot (x,y,’o',a, a*beta)

axis ([0,5,0,12])


Example: LMS withMatlab


Matlab–doingRegression

Using the learned parameters βone can do compute new outputs via regression.

At anarbitraryinputx0thepredictionis:

>> ynovo = beta * 10

ynovo =

20


Addingthe linear Bias β0

THIS ADDS ONE

COLUMN WITH “1”

  • x = [1,2,3,4,5]’

  • y = [2,4,6,8,10]'

  • one = ones(5,1)

  • X = [one, x]

  • v = (X'*X)\(X'*y)

    v =

    0

    2


Linear with Bias β0

THIS ADDS ONE

COLUMN WITH “1”

  • x = [1,2,3,4,5]’

  • y = [3,5,7,9,11]'

  • one = ones(5,1)

  • X = [one, x]

  • v = (X'*X)\(X'*y)

    v =

    1

    2


Linear WITHOUT Bias β0


Linear with Bias β0


LMS with 2 variables

  • Datasetcontains 3 variables, books, attendand grade and has 40 cases.

    • Books representsthenumberofbooksread by studentson a statisticscourse,

    • attendrepresentsthenumberoflecturestheyattendedand

    • grade representstheir final grade onthecourse. 


Datasetbooks, attendand grade


LMS with 2 variables


LMS with 2 variables

load Books_attend_grade.dat

x1 = Books_attend_grade(:,1)

x2 = Books_attend_grade(:,2)

y = Books_attend_grade(:,3)

one = ones(40,1)

X = [one, x1,x2]


LMS with 2 variables

>> v = (X'*X)\(X'*y)

v =

37.3792

4.0369

1.2835


LMS with 2 variables

[x1,x2]=meshgrid(0:1:4,0:1:20)

surf(x1,x2,37.3792+4.0369*x1+1.2835*x2)


Linear Least Square Fitting IR2


Conclusão

  • Vimosoqueé Machine Learning.

  • Começamos a veroqueé Statistical Machine Learning:

    • Métodosmatemáticos (estatísticos) de previsão – regressãoeclassificação.

    • Linear regression and classification.

    • K-Nearest Neighbor


Exercício: Boiling point at the Alps

  • Description: Theboilingpointofwater at differentbarometricpressures.

  • There are 17 observations.

  • Variables:

    • BPt: therecordedboilingpointofwater in degrees F

    • Pressure: thebarometricpressure in inchesofmercury.


Exercício: Boiling point at the Alps

Use Leastsquares e calcule o beta.

Qual o valor de pressão para a temperatura de 200 F?


Exercício: Boiling point at the Alps

Use 5-NN e calcule o mesmo valor.

Qual o valor de pressão para a temperatura de 200 F?


Próxima aula:

  • Statistical Machine Learning:

    • Métodos de validaçãoeseleção.

    • MaisMatlab.

    • PCA.

    • Métodosmais fortes ematemáticos.


Fim


  • Login