T picos especiais em aprendizagem
Download
1 / 108

Tópicos Especiais em Aprendizagem - PowerPoint PPT Presentation


  • 90 Views
  • Uploaded on

Tópicos Especiais em Aprendizagem. Reinaldo Bianchi Centro Universitário da FEI 2012. 1a. Aula. Parte B. Objetivos desta aula. Apresentar os conceitos básicos de Aprendizado de Máquina : Introdu ção. Definições B ásicas. Áreas de Aplicação. Statistical Machine Learning .

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Tópicos Especiais em Aprendizagem' - jun


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
T picos especiais em aprendizagem

Tópicos Especiais em Aprendizagem

Reinaldo Bianchi

Centro Universitário da FEI

2012


1a aula

1a. Aula

Parte B


Objetivos desta aula
Objetivos desta aula

  • Apresentar os conceitos básicos de Aprendizado de Máquina:

    • Introdução.

    • Definições Básicas.

    • Áreas de Aplicação.

  • Statistical Machine Learning.

  • Aula de hoje: Capítulos 1 do Mitchell, 1 do Nilsson e 1 e 2 do Hastie + Wikipedia.


Main approaches according to statistics
MainApproachesaccordingtoStatistics

ExplanationbasedLearning

Decisiontrees

Case BasedLearning

Inductivelearning

BayesianLearning

NearestNeighbors

Neural Networks

Support Vector Machines

GeneticAlgorithms

Regression

Clustering

ReinforcementLearning

Classification

AI StatisticsNeural Network


Main approaches according to statistics1
MainApproachesaccordingtoStatistics

NearestNeighbors

Support Vector Machines

Regression

Clustering

Classification

AI StatisticsNeural Network


Main approaches according to statistics2
MainApproachesaccordingtoStatistics

NearestNeighbors

Regression

Clustering

Classification

AI StatisticsNeural Network


Primeira aula parte b
Primeira aula, parte B

  • Introduction to Statistical Machine Learning:

    • Basic definitions.

    • Regression.

    • Classification.


Livro texto
LivroTexto

  • The Elements of Statistical Learning

  • Data Mining, Inference, and Prediction


Why statistical learning
WhyStatisticalLearning?

  • “Statisticallearning plays a key role in manyareasofscience, financeandindustry.”

  • “Thescienceoflearning plays a key role in thefieldsofstatistics, data miningand artificial intelligence, intersectingwithareasofengineeringandother disciplines.”


Sml problems
SML problems

Predictwhether a patient, hospitalizeddueto a heartattack, willhave a secondheartattack. Thepredictionisto be basedondemographic, dietandclinicalmeasurementsforthatpatient.

Predictthepriceof a stock in 6 monthsfromnow, onthe basis ofcompany performance measuresandeconomic data.


Sml problems1
SML problems

Identifythenumbers in a handwritten ZIP code, from a digitizedimage.

Estimatetheamountofglucose in thebloodof a diabeticperson, fromtheinfraredabsorptionspectrumofthatperson'sblood.

Identifytheriskfactorsforprostatecancer, basedonclinicalanddemographic variables.


Examples of sml problems
Examplesof SML problems

ProstateCancer

StudybyStameyet al. (1989) thatexaminedthecorrelationbetweenthelevelofprostatespecificantigen (PSA) and a numberofclinicalmeasures.

Thegoal is to predictthelogof PSA (lpsa) from a numberofmeasurements.


Examples of supervised learning problems
Examplesofsupervisedlearningproblems


Otherexamplesoflearningproblems

DNA Microarrays

Expression matrix of 6830 genes (rows, only 100 shown) and 64 samples (columns) for the human tumor data.

The display is a heat map, ranging from bright green (negative, under expressed) to bright red (positive, over expressed). Missing values are grey.


Other examples of learning problems

DNA Microarrays

Expression matrix of 6830 genes (rows, only 100 shown) and 64 samples (columns) for the human tumor data.

The display is a heat map, ranging from bright green (negative, under expressed) to bright red (positive, over expressed). Missing values are grey.


Other examples of learning problems

DNA Microarrays

Expression matrix of 6830 genes (rows, only 100 shown) and 64 samples (columns) for the human tumor data.

The display is a heat map, ranging from bright green (negative, under expressed) to bright red (positive, over expressed). Missing values are grey.


Other examples of learning problems

DNA Microarrays

Expression matrix of 6830 genes (rows, only 100 shown) and 64 samples (columns) for the human tumor data.

The display is a heat map, ranging from bright green (negative, under expressed) to bright red (positive, over expressed). Missing values are grey.

  • Task: describe how the data are organised or clustered.

  • (unsupervised learning)



Variable types and terminology
Variable TypesandTerminology

  • In thestatisticalliteraturetheinputsare oftencalledthepredictors, inputs, and more classicallytheindependent variables.

    • In thepatternrecognitionliteraturethetermfeaturesispreferred, whichwe use as well.

  • Theoutputsare calledthe responses, orclassicallythedependent variables.


Variable types and terminology1
Variable TypesandTerminology

  • Theoutputsvary in natureamongtheexamples:

    • ProstateCancerpredictionexample:

      • The output is a quantitativemeasurement.

    • Handwrittendigitexample:

      • The output isoneof 10 differentdigitclasses: G = {0,1,...,9}


Naming convention for the prediction task
Namingconventionforthepredictiontask

  • Thedistinction in output type has ledto a namingconventionforthepredictiontasks:

    • Regressionwhenwepredictquantitativeoutputs.

    • Classificationwhenwepredictqualitativeoutputs.

  • Both can be viewed as a task in functionapproximation.


Examples of sml problems1
Examplesof SML problems

ProstateCancer

StudybyStameyet al. (1989) thatexaminedthecorrelationbetweenthelevelofprostatespecificantigen (PSA) and a numberofclinicalmeasures.

Thegoal is to predictthelogof PSA (lpsa) from a numberofmeasurements.

  • Regressionproblem


Examples of supervised learning problems1
Examplesofsupervisedlearningproblems

  • Classificationproblem


Qualitative variables representation
Qualitative variables representation

  • Qualitative variables are representednumerically by codes:

    • Binary case: iswhenthere are onlytwoclassesorcategories, such as “success” or “failure,” “survived” or “died.”

    • These are oftenrepresented by a single binarydigitorbit as 0 or 1, orelse by −1 and 1.


Qualitative variables representation1
Qualitative variables representation

  • Whenthere are more thantwocategories, Themostcommonlyusedcodingisviadummy variables:

    • K-levelqualitative variable isrepresented by a vector of K binary variables or bits, onlyoneofwhichis “on” at a time.

  • Thesenumericcodes are sometimesreferredto as targets.


Variables
Variables

  • Wewilltypically denote aninput variable by thesymbolX.

    • IfX is a vector, itscomponents can be accessed by subscriptsXj.

    • Observedvalues are written in lowercase: hencetheithobservedvalueofX iswritten as xi

  • Quantitativeoutputswill be denoted by Yandqualitativeoutputswill be denoted by G (forgroup).


Two simple approaches to prediction

Two Simple ApproachestoPrediction:

LeastSquares (método dos mínimos quadrados)

andNearestNeighbors (método dos vizinhosmais próximos)


Linear methods for regression
Linear Methods for Regression

  • “Linear models were largely developed in the pre-computer age of statistics, but even in today’s computer era there are still good reasons to study and use them.” (Hastie et al.)


Linear methods for regression1
Linear Methods for Regression

  • For prediction purposes they can sometimes outperform non-linear models, especially in situations…

    • small sample size

    • low signal-to-noise ratio

    • sparse data

  • Transformation of the inputs


Linear models and least squares
Linear ModelsandLeastSquares

The linear model has been a mainstayofstatisticsforthepast 30 yearsandremainsoneofitsmostimportanttools.

Given a vector ofinputs:

wepredictthe output Y viathemodel:


Linear models
Linear Models

Thetermistheintercept, alsoknown as thebias in machinelearning.

Oftenitisconvenienttoincludetheconstant variable 1 in X, include in the vector ofcoefficients , andthenwritethe linear model in vector form as aninnerproduct:


Positive linear relationship
Positive Linear Relationship

E(y)

Regression line

Intercept

b0

Slope b1

is positive

x


Negative linear relationship
Negative Linear Relationship

E(y)

Regression line

Intercept

b0

Slope b1

is negative

x


No Relationship

E(y)

Regression line

Intercept

b0

Slope b1

is 0

x


Fitting the data least squares
Fitting the data: Least Squares

  • How do wefitthe linear modelto a set of training data?

    • by far themost popular isthemethodofleastsquares.

  • Pick thecoefficientsβtominimizetheResidual SumofSquares:


Least squares method
Least Squares Method

  • Least Squares Criterion:

  • where:

    • yi = observed value of the dependent variable for the ith observation

    • yi = estimated value of the dependent variable for the ith observation

^


Fitting the data least squares1
Fitting the data: Least Squares

  • RSS(β) is a quadraticfunctionoftheparameters, andhenceitsminimumalwaysexists, but may not be unique.

  • Thesolutioniseasiesttocharacterize in matrixnotation:

    • whereXisanN × pmatrixwitheachrowaninput vector

    • yisan N-vector oftheoutputs


Fitting the data least squares2
Fitting the data: Least Squares

  • Differentiating

    withrespecttoβweget:


Fitting the data least squares3
Fitting the data: Least Squares

  • AssumingthatX has full columnrank, we set thefirstderivativetozero:

  • IfXTXisnonsingular, thentheuniquesolutionisgiven by:


Example height x shoe size
Example: height x shoe size

  • We wanted to explore the relationship between a person’s height and their shoe size.

    • We asked to individuals their height and corresponding shoe size.

    • We believe that a persons shoe size depends upon their height.

  • The height is independent variable x.

  • Shoe size is the dependent variable, y.


Example height x shoe size1
Example: height x shoe size

The following data was collected:

Height, x (inches) Shoe size, y

Person 1 69 9.5

Person 2 67 8.5

Person 3 71 11.5

Person 4 65 10.5

Person 5 72 11

Person 6 68 7.5

Person 7 74 12

Person 8 65 7

Person 9 66 7.5

Person 10 72 13


Example height x shoe size2
Example: height x shoe size


Least squares method forma matricial
Least Squares Method(forma matricial)

Theuniquesolutionisgiven by:

Oftenitisconvenienttoincludetheconstant variable 1 in X, include in the vector ofcoefficients


X without bias 0
X without Bias β0


X with bias 0
X with Bias β0


XT


X t x
XTX


X t x1
XTX

n


X t y
XTy


X t y1
XTy


Example height x shoe size3
Example: height x shoe size

Height, x Shoe size, yx2xy

69 9.5 4761 655.5

67 8.5 4489 569.5

71 11.5 5041 816.5

65 10.5 4225 682.5

72 11 5184 792

68 7.5 4624 510

74 12 5476 888

65 7 4225 455

66 7.5 4356 495

72 13 5184 936

689 98 47565 6800





Linear models and least squares regression
Linear ModelsandLeastSquares: Regression

Using the learned parameters βone can do compute new outputs via regression.

At anarbitraryinputx0thepredictionis:

Intuitively, itseemsthatwe do notneed a verylarge data set tofitsuch a model.


Example height x shoe size4
Example Height x Shoe Size

  • Thus if a person is 5 feet tall (i.e. x=60 inches), then I would estimate their shoe size to be:



Two simple approaches to prediction1

Two Simple ApproachestoPrediction:

LeastSquares (método dos mínimos quadrados)

andNearestNeighbors (método dos vizinhosmais próximos)


Nearest neighbor method
Nearest-NeighborMethod

  • Nearest-neighbormethods use thoseobservations in the training set T closest in inputspacetoxtoform .

  • Specifically, thek-nearestneighborfitforisdefined as follows:

    whereNk(x)istheneighborhoodofx.


Nearest neighbor method1
Nearest-NeighborMethod

  • Nk(x) istheneighborhoodofxdefined by thekclosestpointstoxi in the training sample.

  • In words: wefindthekobservationswithxiclosesttox in inputspace, and average their responses.

  • Closenessimplies a metric:

    • weassumeisEuclideandistance


Nearest neighbor method2
Nearest-NeighborMethod

  • Nearest-NeighborMethod can be usedforregressionorforclassification.

  • Regression: just compute yfrom a newx:

  • Classification: compute y andtheclassifyit.


Example
Example

Usingthesame data fromtheheightxshoesizeproblem.

Regression: just compute y fromx = 70.

Use kNN, with k = 5.


Example height x shoe size5
Example: height x shoe size

The following data was collected:

Height, x (inches) Shoe size, y

Person 1 69 9.5

Person 2 67 8.5

Person 3 71 11.5

Person 4 65 10.5

Person 5 72 11

Person 6 68 7.5

Person 7 74 12

Person 8 65 7

Person 9 66 7.5

Person 10 72 13


Example height x shoe size6
Example: height x shoe size

The following data was collected:

Height, x (inches) Shoe size, y

Person 1 69 9.5

Person 2 67 8.5

Person 3 71 11.5

Person 4 65 10.5

Person 5 72 11

Person 6 68 7.5

Person 7 74 12

Person 8 65 7

Person 9 66 7.5

Person 10 72 13


Example height x shoe size7
Example: height x shoe size

The following data was collected:

Height, x (inches) Shoe size, y

Person 1 69 9.5

Person 2 67 8.5

Person 3 71 11.5

Person 4 65 10.5

Person 5 72 11

Person 6 68 7.5

Person 7 74 12

Person 8 65 7

Person 9 66 7.5

Person 10 72 13


Example1
Example

Regression: just compute y fromx = 70.

Use kNN, with k = 5.




Two simple approaches to classification

Two Simple ApproachestoClassification

We can use bothmethodsforclassification:

- LeastSquaresand

- NearestNeighbors


Least squares for classification
Least Squares for Classification

  • Theentirefittedsurfaceischaracterized by theparametersβ.

  • We can use thismethodforclassificationproposes:

    • Giventhe training data, fitit in a model.

    • Theresultinglineisused as a separationboundary.


Classification example
Classification Example

Figure 2.1 shows a scatterplotof training data on a pairofinputs X1and X2.

The data are simulated.

The output class variable G has thevalues BLUE or ORANGE

The linear regressionmodelwasfittothese data, withthe response Y coded as 0 for BLUE and 1 for ORANGE.



Classification example1
Classification Example

  • Thelineisthedecisionboundarydefined by:

  • The orange shadedregion denotes thatpartofinputspaceclassified as ORANGE, whilethe blue regionisclassified as BLUE.

  • There are severalmisclassificationsonbothsidesofthedecisionboundary.


Nearest neighbor as classifier
Nearest-Neighbor as classifier

Figure 2.2 use thesame training data as in Figure 2.1, and use 15-nearest-neighboraveragingofthebinarycoded response as themethodoffitting.

istheproportionofORANGE’s in theneighborhood, and so assigningclass ORANGE if > 0.5 amountsto a majority vote in theneighborhood.


15 nearest neighbor classifier
15 nearest-neighborclassifier


Nearest neighbor as classifier1
Nearest-Neighbor as classifier

Weseethatthedecisionboundariesthatseparatethe BLUE fromthe ORANGE regions are far more irregular, andrespondto local clusters whereoneclassdominates.

In Figure 2.2 weseethat far fewer training observations are misclassifiedthan in Figure 2.1.


1 nearest neighbor classifier
1 nearest-neighborclassifier


Nearest neighbor as classifier2
Nearest-Neighbor as classifier

Figure 2.3 noneofthe training data are misclassified…

Fork-nearest-neighborfits, the error onthe training data should be approximatelyanincreasingfunctionofk, andwillalways be 0 fork = 1.

k-nearest-neighborhave a single parameter, thenumberofneighborsk.


From least squares to nearest neighbors
FromLeastSquarestoNearestNeighbors

LeastSquares:

The linear decisionboundaryfromleastsquaresisverysmooth, andap- parentlystabletofit.

Itdoesappeartorelyheavilyontheassumptionthat a linear decisionboundaryisappropriate.


From least squares to nearest neighbors1
FromLeastSquarestoNearestNeighbors

k-NN:

do notrelyonanystringentassumptionsabouttheunderlying data, and can adapttoanysituation.

Any particular subregionofthedecisionboundarydependson a handfulofinputpointsandtheir particular positions, andisthuswigglyandunstable.


Linear regression x knn
Linear regression xkNN

Linear regressionis more appropriatewheneachclassisgeneratedfrombivariateGaussiandistributionswithuncorrelatedcomponentsanddifferentmeans.

Nearestneighbors are more suitablewhen training data in eachclasscamefrom a mixture ofGaussiandistributions.


Como fica no matlab

Como fica no Matlab?


Least squares with matlab
LeastSquareswithMatlab

Curve FittingToolbox software uses the linear least-squaresmethodtofit a linear modelto data.

Fittingrequires a parametricmodelthat relates the response data tothepredictor data withoneor more coefficients.

Theresultofthefittingprocessisanestimateofthemodelcoefficients.


Least squares with matlab1
LeastSquareswithMatlab

  • Use the MATLAB backslashoperator (mldivide) tosolve a systemofsimultaneous linear equationsforunknowncoefficients:

  • BecauseinvertingXTX can leadtoroundingerrors, Matlab uses QR decompositionwithpivoting, whichis a verystablealgorithmnumerically.


Least squares with matlab2
LeastSquareswithMatlab

  • In matrixform, linear models are given by the formula:

    • y = Xβ + ε

  • Where:

    • yisann-by-1 vector of responses.

    • βis a m-by-1 vector ofcoefficients.

    • Xisthen-by-mdesignmatrixforthemodel.

    • εisann-by-1 vector oferrors.


Example lms with matlab
Example: LMS withMatlab

Nota: ' significa transposta no matlab

  • X = [1,2,3,4,5]’

  • y = [2,4,6,8,10]’

  • beta = (X'*X)\(X'*y)

    beta =

    2


Plotting the scatter plot in matlab
Plotting the scatter plot in Matlab

a = 0;0.1;5

plot (x,y,’o',a, a*beta)

axis ([0,5,0,12])


Example lms with matlab1
Example: LMS withMatlab


Matlab doing regression
Matlab–doingRegression

Using the learned parameters βone can do compute new outputs via regression.

At anarbitraryinputx0thepredictionis:

>> ynovo = beta * 10

ynovo =

20


Adding the linear bias 0
Addingthe linear Bias β0

THIS ADDS ONE

COLUMN WITH “1”

  • x = [1,2,3,4,5]’

  • y = [2,4,6,8,10]'

  • one = ones(5,1)

  • X = [one, x]

  • v = (X'*X)\(X'*y)

    v =

    0

    2


Linear with bias 0
Linear with Bias β0

THIS ADDS ONE

COLUMN WITH “1”

  • x = [1,2,3,4,5]’

  • y = [3,5,7,9,11]'

  • one = ones(5,1)

  • X = [one, x]

  • v = (X'*X)\(X'*y)

    v =

    1

    2



Linear with bias 01
Linear with Bias β0


Lms with 2 variables
LMS with 2 variables

  • Datasetcontains 3 variables, books, attendand grade and has 40 cases.

    • Books representsthenumberofbooksread by studentson a statisticscourse,

    • attendrepresentsthenumberoflecturestheyattendedand

    • grade representstheir final grade onthecourse. 


Dataset books attend and grade
Datasetbooks, attendand grade


Lms with 2 variables1
LMS with 2 variables


Lms with 2 variables2
LMS with 2 variables

load Books_attend_grade.dat

x1 = Books_attend_grade(:,1)

x2 = Books_attend_grade(:,2)

y = Books_attend_grade(:,3)

one = ones(40,1)

X = [one, x1,x2]


Lms with 2 variables3
LMS with 2 variables

>> v = (X'*X)\(X'*y)

v =

37.3792

4.0369

1.2835


Lms with 2 variables4
LMS with 2 variables

[x1,x2]=meshgrid(0:1:4,0:1:20)

surf(x1,x2,37.3792+4.0369*x1+1.2835*x2)



Conclus o
Conclusão

  • Vimosoqueé Machine Learning.

  • Começamos a veroqueé Statistical Machine Learning:

    • Métodosmatemáticos (estatísticos) de previsão – regressãoeclassificação.

    • Linear regression and classification.

    • K-Nearest Neighbor


Exerc cio boiling point at the alps
Exercício: Boiling point at the Alps

  • Description: Theboilingpointofwater at differentbarometricpressures.

  • There are 17 observations.

  • Variables:

    • BPt: therecordedboilingpointofwater in degrees F

    • Pressure: thebarometricpressure in inchesofmercury.


Exerc cio boiling point at the alps1
Exercício: Boiling point at the Alps

Use Leastsquares e calcule o beta.

Qual o valor de pressão para a temperatura de 200 F?


Exerc cio boiling point at the alps2
Exercício: Boiling point at the Alps

Use 5-NN e calcule o mesmo valor.

Qual o valor de pressão para a temperatura de 200 F?


Pr xima aula
Próxima aula:

  • Statistical Machine Learning:

    • Métodos de validaçãoeseleção.

    • MaisMatlab.

    • PCA.

    • Métodosmais fortes ematemáticos.



ad