Download Presentation
## Tópicos Especiais em Aprendizagem

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Tópicos Especiais em Aprendizagem**Reinaldo Bianchi Centro Universitário da FEI 2012**1a. Aula**Parte B**Objetivos desta aula**• Apresentar os conceitos básicos de Aprendizado de Máquina: • Introdução. • Definições Básicas. • Áreas de Aplicação. • Statistical Machine Learning. • Aula de hoje: Capítulos 1 do Mitchell, 1 do Nilsson e 1 e 2 do Hastie + Wikipedia.**MainApproachesaccordingtoStatistics**ExplanationbasedLearning Decisiontrees Case BasedLearning Inductivelearning BayesianLearning NearestNeighbors Neural Networks Support Vector Machines GeneticAlgorithms Regression Clustering ReinforcementLearning Classification AI StatisticsNeural Network**MainApproachesaccordingtoStatistics**NearestNeighbors Support Vector Machines Regression Clustering Classification AI StatisticsNeural Network**MainApproachesaccordingtoStatistics**NearestNeighbors Regression Clustering Classification AI StatisticsNeural Network**Primeira aula, parte B**• Introduction to Statistical Machine Learning: • Basic definitions. • Regression. • Classification.**LivroTexto**• The Elements of Statistical Learning • Data Mining, Inference, and Prediction**WhyStatisticalLearning?**• “Statisticallearning plays a key role in manyareasofscience, financeandindustry.” • “Thescienceoflearning plays a key role in thefieldsofstatistics, data miningand artificial intelligence, intersectingwithareasofengineeringandother disciplines.”**SML problems**Predictwhether a patient, hospitalizeddueto a heartattack, willhave a secondheartattack. Thepredictionisto be basedondemographic, dietandclinicalmeasurementsforthatpatient. Predictthepriceof a stock in 6 monthsfromnow, onthe basis ofcompany performance measuresandeconomic data.**SML problems**Identifythenumbers in a handwritten ZIP code, from a digitizedimage. Estimatetheamountofglucose in thebloodof a diabeticperson, fromtheinfraredabsorptionspectrumofthatperson'sblood. Identifytheriskfactorsforprostatecancer, basedonclinicalanddemographic variables.**Examplesof SML problems**ProstateCancer StudybyStameyet al. (1989) thatexaminedthecorrelationbetweenthelevelofprostatespecificantigen (PSA) and a numberofclinicalmeasures. Thegoal is to predictthelogof PSA (lpsa) from a numberofmeasurements.**Otherexamplesoflearningproblems**DNA Microarrays Expression matrix of 6830 genes (rows, only 100 shown) and 64 samples (columns) for the human tumor data. The display is a heat map, ranging from bright green (negative, under expressed) to bright red (positive, over expressed). Missing values are grey.**Other examples of learning problems**DNA Microarrays Expression matrix of 6830 genes (rows, only 100 shown) and 64 samples (columns) for the human tumor data. The display is a heat map, ranging from bright green (negative, under expressed) to bright red (positive, over expressed). Missing values are grey.**Other examples of learning problems**DNA Microarrays Expression matrix of 6830 genes (rows, only 100 shown) and 64 samples (columns) for the human tumor data. The display is a heat map, ranging from bright green (negative, under expressed) to bright red (positive, over expressed). Missing values are grey.**Other examples of learning problems**DNA Microarrays Expression matrix of 6830 genes (rows, only 100 shown) and 64 samples (columns) for the human tumor data. The display is a heat map, ranging from bright green (negative, under expressed) to bright red (positive, over expressed). Missing values are grey. • Task: describe how the data are organised or clustered. • (unsupervised learning)**Overview of Supervised Learning**Cap 2 do Hastie**Variable TypesandTerminology**• In thestatisticalliteraturetheinputsare oftencalledthepredictors, inputs, and more classicallytheindependent variables. • In thepatternrecognitionliteraturethetermfeaturesispreferred, whichwe use as well. • Theoutputsare calledthe responses, orclassicallythedependent variables.**Variable TypesandTerminology**• Theoutputsvary in natureamongtheexamples: • ProstateCancerpredictionexample: • The output is a quantitativemeasurement. • Handwrittendigitexample: • The output isoneof 10 differentdigitclasses: G = {0,1,...,9}**Namingconventionforthepredictiontask**• Thedistinction in output type has ledto a namingconventionforthepredictiontasks: • Regressionwhenwepredictquantitativeoutputs. • Classificationwhenwepredictqualitativeoutputs. • Both can be viewed as a task in functionapproximation.**Examplesof SML problems**ProstateCancer StudybyStameyet al. (1989) thatexaminedthecorrelationbetweenthelevelofprostatespecificantigen (PSA) and a numberofclinicalmeasures. Thegoal is to predictthelogof PSA (lpsa) from a numberofmeasurements. • Regressionproblem**Examplesofsupervisedlearningproblems**• Classificationproblem**Qualitative variables representation**• Qualitative variables are representednumerically by codes: • Binary case: iswhenthere are onlytwoclassesorcategories, such as “success” or “failure,” “survived” or “died.” • These are oftenrepresented by a single binarydigitorbit as 0 or 1, orelse by −1 and 1.**Qualitative variables representation**• Whenthere are more thantwocategories, Themostcommonlyusedcodingisviadummy variables: • K-levelqualitative variable isrepresented by a vector of K binary variables or bits, onlyoneofwhichis “on” at a time. • Thesenumericcodes are sometimesreferredto as targets.**Variables**• Wewilltypically denote aninput variable by thesymbolX. • IfX is a vector, itscomponents can be accessed by subscriptsXj. • Observedvalues are written in lowercase: hencetheithobservedvalueofX iswritten as xi • Quantitativeoutputswill be denoted by Yandqualitativeoutputswill be denoted by G (forgroup).**Two Simple ApproachestoPrediction:**LeastSquares (método dos mínimos quadrados) andNearestNeighbors (método dos vizinhosmais próximos)**Linear Methods for Regression**• “Linear models were largely developed in the pre-computer age of statistics, but even in today’s computer era there are still good reasons to study and use them.” (Hastie et al.)**Linear Methods for Regression**• For prediction purposes they can sometimes outperform non-linear models, especially in situations… • small sample size • low signal-to-noise ratio • sparse data • Transformation of the inputs**Linear ModelsandLeastSquares**The linear model has been a mainstayofstatisticsforthepast 30 yearsandremainsoneofitsmostimportanttools. Given a vector ofinputs: wepredictthe output Y viathemodel:**Linear Models**Thetermistheintercept, alsoknown as thebias in machinelearning. Oftenitisconvenienttoincludetheconstant variable 1 in X, include in the vector ofcoefficients , andthenwritethe linear model in vector form as aninnerproduct:**Positive Linear Relationship**E(y) Regression line Intercept b0 Slope b1 is positive x**Negative Linear Relationship**E(y) Regression line Intercept b0 Slope b1 is negative x**No Relationship**E(y) Regression line Intercept b0 Slope b1 is 0 x**Fitting the data: Least Squares**• How do wefitthe linear modelto a set of training data? • by far themost popular isthemethodofleastsquares. • Pick thecoefficientsβtominimizetheResidual SumofSquares:**Least Squares Method**• Least Squares Criterion: • where: • yi = observed value of the dependent variable for the ith observation • yi = estimated value of the dependent variable for the ith observation ^**Fitting the data: Least Squares**• RSS(β) is a quadraticfunctionoftheparameters, andhenceitsminimumalwaysexists, but may not be unique. • Thesolutioniseasiesttocharacterize in matrixnotation: • whereXisanN × pmatrixwitheachrowaninput vector • yisan N-vector oftheoutputs**Fitting the data: Least Squares**• Differentiating withrespecttoβweget:**Fitting the data: Least Squares**• AssumingthatX has full columnrank, we set thefirstderivativetozero: • IfXTXisnonsingular, thentheuniquesolutionisgiven by:**Example: height x shoe size**• We wanted to explore the relationship between a person’s height and their shoe size. • We asked to individuals their height and corresponding shoe size. • We believe that a persons shoe size depends upon their height. • The height is independent variable x. • Shoe size is the dependent variable, y.**Example: height x shoe size**The following data was collected: Height, x (inches) Shoe size, y Person 1 69 9.5 Person 2 67 8.5 Person 3 71 11.5 Person 4 65 10.5 Person 5 72 11 Person 6 68 7.5 Person 7 74 12 Person 8 65 7 Person 9 66 7.5 Person 10 72 13**Least Squares Method(forma matricial)**Theuniquesolutionisgiven by: Oftenitisconvenienttoincludetheconstant variable 1 in X, include in the vector ofcoefficients**XTX**n