Loading in 5 sec....

Tópicos Especiais em AprendizagemPowerPoint Presentation

Tópicos Especiais em Aprendizagem

- By
**jun** - Follow User

- 98 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about '' - jun

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### 1a. Aula

Expression matrix of 6830 genes (rows, only 100 shown) and 64 samples (columns) for the human tumor data.

The display is a heat map, ranging from bright green (negative, under expressed) to bright red (positive, over expressed). Missing values are grey.

### Overview of Supervised Learning

### Two Simple ApproachestoPrediction:

### Two Simple ApproachestoPrediction:

Example: height x shoe size

### Two Simple ApproachestoClassification

### Como fica no Matlab?

Parte B

Objetivos desta aula

- Apresentar os conceitos básicos de Aprendizado de Máquina:
- Introdução.
- Definições Básicas.
- Áreas de Aplicação.

- Statistical Machine Learning.
- Aula de hoje: Capítulos 1 do Mitchell, 1 do Nilsson e 1 e 2 do Hastie + Wikipedia.

MainApproachesaccordingtoStatistics

ExplanationbasedLearning

Decisiontrees

Case BasedLearning

Inductivelearning

BayesianLearning

NearestNeighbors

Neural Networks

Support Vector Machines

GeneticAlgorithms

Regression

Clustering

ReinforcementLearning

Classification

AI StatisticsNeural Network

MainApproachesaccordingtoStatistics

NearestNeighbors

Support Vector Machines

Regression

Clustering

Classification

AI StatisticsNeural Network

MainApproachesaccordingtoStatistics

NearestNeighbors

Regression

Clustering

Classification

AI StatisticsNeural Network

Primeira aula, parte B

- Introduction to Statistical Machine Learning:
- Basic definitions.
- Regression.
- Classification.

LivroTexto

- The Elements of Statistical Learning
- Data Mining, Inference, and Prediction

WhyStatisticalLearning?

- “Statisticallearning plays a key role in manyareasofscience, financeandindustry.”
- “Thescienceoflearning plays a key role in thefieldsofstatistics, data miningand artificial intelligence, intersectingwithareasofengineeringandother disciplines.”

SML problems

Predictwhether a patient, hospitalizeddueto a heartattack, willhave a secondheartattack. Thepredictionisto be basedondemographic, dietandclinicalmeasurementsforthatpatient.

Predictthepriceof a stock in 6 monthsfromnow, onthe basis ofcompany performance measuresandeconomic data.

SML problems

Identifythenumbers in a handwritten ZIP code, from a digitizedimage.

Estimatetheamountofglucose in thebloodof a diabeticperson, fromtheinfraredabsorptionspectrumofthatperson'sblood.

Identifytheriskfactorsforprostatecancer, basedonclinicalanddemographic variables.

Examplesof SML problems

ProstateCancer

StudybyStameyet al. (1989) thatexaminedthecorrelationbetweenthelevelofprostatespecificantigen (PSA) and a numberofclinicalmeasures.

Thegoal is to predictthelogof PSA (lpsa) from a numberofmeasurements.

Examplesofsupervisedlearningproblems

Otherexamplesoflearningproblems

DNA Microarrays

Expression matrix of 6830 genes (rows, only 100 shown) and 64 samples (columns) for the human tumor data.

The display is a heat map, ranging from bright green (negative, under expressed) to bright red (positive, over expressed). Missing values are grey.

Other examples of learning problems

DNA Microarrays

Expression matrix of 6830 genes (rows, only 100 shown) and 64 samples (columns) for the human tumor data.

The display is a heat map, ranging from bright green (negative, under expressed) to bright red (positive, over expressed). Missing values are grey.

Other examples of learning problems

DNA Microarrays

Expression matrix of 6830 genes (rows, only 100 shown) and 64 samples (columns) for the human tumor data.

The display is a heat map, ranging from bright green (negative, under expressed) to bright red (positive, over expressed). Missing values are grey.

Other examples of learning problems

DNA Microarrays

- Task: describe how the data are organised or clustered.
- (unsupervised learning)

Cap 2 do Hastie

Variable TypesandTerminology

- In thestatisticalliteraturetheinputsare oftencalledthepredictors, inputs, and more classicallytheindependent variables.
- In thepatternrecognitionliteraturethetermfeaturesispreferred, whichwe use as well.

- Theoutputsare calledthe responses, orclassicallythedependent variables.

Variable TypesandTerminology

- Theoutputsvary in natureamongtheexamples:
- ProstateCancerpredictionexample:
- The output is a quantitativemeasurement.

- Handwrittendigitexample:
- The output isoneof 10 differentdigitclasses: G = {0,1,...,9}

- ProstateCancerpredictionexample:

Namingconventionforthepredictiontask

- Thedistinction in output type has ledto a namingconventionforthepredictiontasks:
- Regressionwhenwepredictquantitativeoutputs.
- Classificationwhenwepredictqualitativeoutputs.

- Both can be viewed as a task in functionapproximation.

Examplesof SML problems

ProstateCancer

StudybyStameyet al. (1989) thatexaminedthecorrelationbetweenthelevelofprostatespecificantigen (PSA) and a numberofclinicalmeasures.

Thegoal is to predictthelogof PSA (lpsa) from a numberofmeasurements.

- Regressionproblem

Examplesofsupervisedlearningproblems

- Classificationproblem

Qualitative variables representation

- Qualitative variables are representednumerically by codes:
- Binary case: iswhenthere are onlytwoclassesorcategories, such as “success” or “failure,” “survived” or “died.”
- These are oftenrepresented by a single binarydigitorbit as 0 or 1, orelse by −1 and 1.

Qualitative variables representation

- Whenthere are more thantwocategories, Themostcommonlyusedcodingisviadummy variables:
- K-levelqualitative variable isrepresented by a vector of K binary variables or bits, onlyoneofwhichis “on” at a time.

- Thesenumericcodes are sometimesreferredto as targets.

Variables

- Wewilltypically denote aninput variable by thesymbolX.
- IfX is a vector, itscomponents can be accessed by subscriptsXj.
- Observedvalues are written in lowercase: hencetheithobservedvalueofX iswritten as xi

- Quantitativeoutputswill be denoted by Yandqualitativeoutputswill be denoted by G (forgroup).

LeastSquares (método dos mínimos quadrados)

andNearestNeighbors (método dos vizinhosmais próximos)

Linear Methods for Regression

- “Linear models were largely developed in the pre-computer age of statistics, but even in today’s computer era there are still good reasons to study and use them.” (Hastie et al.)

Linear Methods for Regression

- For prediction purposes they can sometimes outperform non-linear models, especially in situations…
- small sample size
- low signal-to-noise ratio
- sparse data

- Transformation of the inputs

Linear ModelsandLeastSquares

The linear model has been a mainstayofstatisticsforthepast 30 yearsandremainsoneofitsmostimportanttools.

Given a vector ofinputs:

wepredictthe output Y viathemodel:

Linear Models

Thetermistheintercept, alsoknown as thebias in machinelearning.

Oftenitisconvenienttoincludetheconstant variable 1 in X, include in the vector ofcoefficients , andthenwritethe linear model in vector form as aninnerproduct:

Fitting the data: Least Squares

- How do wefitthe linear modelto a set of training data?
- by far themost popular isthemethodofleastsquares.

- Pick thecoefficientsβtominimizetheResidual SumofSquares:

Least Squares Method

- Least Squares Criterion:
- where:
- yi = observed value of the dependent variable for the ith observation
- yi = estimated value of the dependent variable for the ith observation

^

Fitting the data: Least Squares

- RSS(β) is a quadraticfunctionoftheparameters, andhenceitsminimumalwaysexists, but may not be unique.
- Thesolutioniseasiesttocharacterize in matrixnotation:
- whereXisanN × pmatrixwitheachrowaninput vector
- yisan N-vector oftheoutputs

Fitting the data: Least Squares

- Differentiating
withrespecttoβweget:

Fitting the data: Least Squares

- AssumingthatX has full columnrank, we set thefirstderivativetozero:
- IfXTXisnonsingular, thentheuniquesolutionisgiven by:

Example: height x shoe size

- We wanted to explore the relationship between a person’s height and their shoe size.
- We asked to individuals their height and corresponding shoe size.
- We believe that a persons shoe size depends upon their height.

- The height is independent variable x.
- Shoe size is the dependent variable, y.

Example: height x shoe size

The following data was collected:

Height, x (inches) Shoe size, y

Person 1 69 9.5

Person 2 67 8.5

Person 3 71 11.5

Person 4 65 10.5

Person 5 72 11

Person 6 68 7.5

Person 7 74 12

Person 8 65 7

Person 9 66 7.5

Person 10 72 13

Example: height x shoe size

Least Squares Method(forma matricial)

Theuniquesolutionisgiven by:

Oftenitisconvenienttoincludetheconstant variable 1 in X, include in the vector ofcoefficients

X without Bias β0

X with Bias β0

XT

XTX

XTX

n

XTy

XTy

Example: height x shoe size

Height, x Shoe size, yx2xy

69 9.5 4761 655.5

67 8.5 4489 569.5

71 11.5 5041 816.5

65 10.5 4225 682.5

72 11 5184 792

68 7.5 4624 510

74 12 5476 888

65 7 4225 455

66 7.5 4356 495

72 13 5184 936

689 98 47565 6800

Linear ModelsandLeastSquares: Regression

Using the learned parameters βone can do compute new outputs via regression.

At anarbitraryinputx0thepredictionis:

Intuitively, itseemsthatwe do notneed a verylarge data set tofitsuch a model.

Example Height x Shoe Size

- Thus if a person is 5 feet tall (i.e. x=60 inches), then I would estimate their shoe size to be:

LeastSquares (método dos mínimos quadrados)

andNearestNeighbors (método dos vizinhosmais próximos)

Nearest-NeighborMethod

- Nearest-neighbormethods use thoseobservations in the training set T closest in inputspacetoxtoform .
- Specifically, thek-nearestneighborfitforisdefined as follows:
whereNk(x)istheneighborhoodofx.

Nearest-NeighborMethod

- Nk(x) istheneighborhoodofxdefined by thekclosestpointstoxi in the training sample.
- In words: wefindthekobservationswithxiclosesttox in inputspace, and average their responses.
- Closenessimplies a metric:
- weassumeisEuclideandistance

Nearest-NeighborMethod

- Nearest-NeighborMethod can be usedforregressionorforclassification.
- Regression: just compute yfrom a newx:
- Classification: compute y andtheclassifyit.

Example

Usingthesame data fromtheheightxshoesizeproblem.

Regression: just compute y fromx = 70.

Use kNN, with k = 5.

Example: height x shoe size

The following data was collected:

Height, x (inches) Shoe size, y

Person 1 69 9.5

Person 2 67 8.5

Person 3 71 11.5

Person 4 65 10.5

Person 5 72 11

Person 6 68 7.5

Person 7 74 12

Person 8 65 7

Person 9 66 7.5

Person 10 72 13

Example: height x shoe size

The following data was collected:

Height, x (inches) Shoe size, y

Person 1 69 9.5

Person 2 67 8.5

Person 3 71 11.5

Person 4 65 10.5

Person 5 72 11

Person 6 68 7.5

Person 7 74 12

Person 8 65 7

Person 9 66 7.5

Person 10 72 13

The following data was collected:

Height, x (inches) Shoe size, y

Person 1 69 9.5

Person 2 67 8.5

Person 3 71 11.5

Person 4 65 10.5

Person 5 72 11

Person 6 68 7.5

Person 7 74 12

Person 8 65 7

Person 9 66 7.5

Person 10 72 13

We can use bothmethodsforclassification:

- LeastSquaresand

- NearestNeighbors

Least Squares for Classification

- Theentirefittedsurfaceischaracterized by theparametersβ.
- We can use thismethodforclassificationproposes:
- Giventhe training data, fitit in a model.
- Theresultinglineisused as a separationboundary.

Classification Example

Figure 2.1 shows a scatterplotof training data on a pairofinputs X1and X2.

The data are simulated.

The output class variable G has thevalues BLUE or ORANGE

The linear regressionmodelwasfittothese data, withthe response Y coded as 0 for BLUE and 1 for ORANGE.

Classification Example

- Thelineisthedecisionboundarydefined by:
- The orange shadedregion denotes thatpartofinputspaceclassified as ORANGE, whilethe blue regionisclassified as BLUE.
- There are severalmisclassificationsonbothsidesofthedecisionboundary.

Nearest-Neighbor as classifier

Figure 2.2 use thesame training data as in Figure 2.1, and use 15-nearest-neighboraveragingofthebinarycoded response as themethodoffitting.

istheproportionofORANGE’s in theneighborhood, and so assigningclass ORANGE if > 0.5 amountsto a majority vote in theneighborhood.

15 nearest-neighborclassifier

Nearest-Neighbor as classifier

Weseethatthedecisionboundariesthatseparatethe BLUE fromthe ORANGE regions are far more irregular, andrespondto local clusters whereoneclassdominates.

In Figure 2.2 weseethat far fewer training observations are misclassifiedthan in Figure 2.1.

1 nearest-neighborclassifier

Nearest-Neighbor as classifier

Figure 2.3 noneofthe training data are misclassified…

Fork-nearest-neighborfits, the error onthe training data should be approximatelyanincreasingfunctionofk, andwillalways be 0 fork = 1.

k-nearest-neighborhave a single parameter, thenumberofneighborsk.

FromLeastSquarestoNearestNeighbors

LeastSquares:

The linear decisionboundaryfromleastsquaresisverysmooth, andap- parentlystabletofit.

Itdoesappeartorelyheavilyontheassumptionthat a linear decisionboundaryisappropriate.

FromLeastSquarestoNearestNeighbors

k-NN:

do notrelyonanystringentassumptionsabouttheunderlying data, and can adapttoanysituation.

Any particular subregionofthedecisionboundarydependson a handfulofinputpointsandtheir particular positions, andisthuswigglyandunstable.

Linear regression xkNN

Linear regressionis more appropriatewheneachclassisgeneratedfrombivariateGaussiandistributionswithuncorrelatedcomponentsanddifferentmeans.

Nearestneighbors are more suitablewhen training data in eachclasscamefrom a mixture ofGaussiandistributions.

LeastSquareswithMatlab

Curve FittingToolbox software uses the linear least-squaresmethodtofit a linear modelto data.

Fittingrequires a parametricmodelthat relates the response data tothepredictor data withoneor more coefficients.

Theresultofthefittingprocessisanestimateofthemodelcoefficients.

LeastSquareswithMatlab

- Use the MATLAB backslashoperator (mldivide) tosolve a systemofsimultaneous linear equationsforunknowncoefficients:
- BecauseinvertingXTX can leadtoroundingerrors, Matlab uses QR decompositionwithpivoting, whichis a verystablealgorithmnumerically.

LeastSquareswithMatlab

- In matrixform, linear models are given by the formula:
- y = Xβ + ε

- Where:
- yisann-by-1 vector of responses.
- βis a m-by-1 vector ofcoefficients.
- Xisthen-by-mdesignmatrixforthemodel.
- εisann-by-1 vector oferrors.

Example: LMS withMatlab

Nota: ' significa transposta no matlab

- X = [1,2,3,4,5]’
- y = [2,4,6,8,10]’
- beta = (X'*X)\(X'*y)
beta =

2

Example: LMS withMatlab

Matlab–doingRegression

Using the learned parameters βone can do compute new outputs via regression.

At anarbitraryinputx0thepredictionis:

>> ynovo = beta * 10

ynovo =

20

Addingthe linear Bias β0

THIS ADDS ONE

COLUMN WITH “1”

- x = [1,2,3,4,5]’
- y = [2,4,6,8,10]'
- one = ones(5,1)
- X = [one, x]
- v = (X'*X)\(X'*y)
v =

0

2

Linear with Bias β0

THIS ADDS ONE

COLUMN WITH “1”

- x = [1,2,3,4,5]’
- y = [3,5,7,9,11]'
- one = ones(5,1)
- X = [one, x]
- v = (X'*X)\(X'*y)
v =

1

2

Linear with Bias β0

LMS with 2 variables

- Datasetcontains 3 variables, books, attendand grade and has 40 cases.
- Books representsthenumberofbooksread by studentson a statisticscourse,
- attendrepresentsthenumberoflecturestheyattendedand
- grade representstheir final grade onthecourse.

Datasetbooks, attendand grade

LMS with 2 variables

LMS with 2 variables

load Books_attend_grade.dat

x1 = Books_attend_grade(:,1)

x2 = Books_attend_grade(:,2)

y = Books_attend_grade(:,3)

one = ones(40,1)

X = [one, x1,x2]

Conclusão

- Vimosoqueé Machine Learning.
- Começamos a veroqueé Statistical Machine Learning:
- Métodosmatemáticos (estatísticos) de previsão – regressãoeclassificação.
- Linear regression and classification.
- K-Nearest Neighbor

Exercício: Boiling point at the Alps

- Description: Theboilingpointofwater at differentbarometricpressures.
- There are 17 observations.
- Variables:
- BPt: therecordedboilingpointofwater in degrees F
- Pressure: thebarometricpressure in inchesofmercury.

Exercício: Boiling point at the Alps

Use Leastsquares e calcule o beta.

Qual o valor de pressão para a temperatura de 200 F?

Exercício: Boiling point at the Alps

Use 5-NN e calcule o mesmo valor.

Qual o valor de pressão para a temperatura de 200 F?

Próxima aula:

- Statistical Machine Learning:
- Métodos de validaçãoeseleção.
- MaisMatlab.
- PCA.
- Métodosmais fortes ematemáticos.

Download Presentation

Connecting to Server..