1 / 61

Logistic Regression in R | Machine Learning Algorithms | Data Science Training | Edureka

This Logistic Regression Tutorial shall give you a clear understanding as to how a Logistic Regression machine learning algorithm works in R. Towards the end, in our demo we will be predicting which patients have diabetes using Logistic Regression! In this Logistic Regression Tutorial you will understand: <br><br>1) The 5 Questions asked in Data Science <br>2) What is Regression? <br>3) Logistic Regression - What and Why? <br>4) How does Logistic Regression Work? <br>5) Demo in R: Diabetes Use Case <br>6) Logistic Regression: Use Cases

EdurekaIN
Download Presentation

Logistic Regression in R | Machine Learning Algorithms | Data Science Training | Edureka

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Logistic Regression www.edureka.co/data-science Edureka’s Data Science Certification Training

  2. What Will You Learn Today? 3 1 2 The 5 Questions asked in Data Science Logistic Regression – What and Why? What is Regression? 5 6 4 Logistic Regression – Use Cases How does Logistic Regression work? Demo In R: Diabetes Use Case www.edureka.co/data-science Edureka’s Data Science Certification Training

  3. The 5 Questions Asked In Data Science www.edureka.co/data-science Edureka’s Data Science Certification Training

  4. The 5 Questions Asked In Data Science In data science, basically we have 5 kind of problems. Classification Algorithm Is this A or B? Q1. Anomaly Detection Algorithm Q2. Is this weird? How much or how many? Regression Algorithms Q3. How is this organized? Clustering Algorithms Q4. Reinforcement Learning What should I do next? Q5. www.edureka.co/data-science Edureka’s Data Science Certification Training

  5. The 5 Questions Asked In Data Science In data science, basically we have 5 kind of problems. Classification Algorithm Is this A or B? Is this A or B? Q1. Anomaly Detection Algorithm Q2. Is this weird? How much or how many? Regression Algorithms Q3. How is this organized? Clustering Algorithms Q4. Reinforcement Learning What should I do next? Q5. www.edureka.co/data-science Edureka’s Data Science Certification Training

  6. What Is Regression? www.edureka.co/data-science Edureka’s Data Science Certification Training

  7. What Is Regression? Y-axis ➢ Regression analysis is a predictive modelling technique. ➢ It estimates the relationship between a dependent (target) and an independent variable (predictor). X-axis Input value = 7.00 Predicted outcome = 123.9 www.edureka.co/data-science Edureka’s Data Science Certification Training

  8. Types Of Regression www.edureka.co/data-science Edureka’s Data Science Certification Training

  9. Types Of Regression Linear Regression Polynomial Regression Logistic Regression • When there is a linear relationship between independent and dependent variables. • When the power of independent variable is more than 1. • When the dependent variable is categorical (0/ 1, True/ False, Yes/ No, A/B/C) in nature. Y Y X X www.edureka.co/data-science Edureka’s Data Science Certification Training

  10. Types Of Regression Linear Regression Polynomial Regression Logistic Regression • When there is a linear relationship between independent and dependent variables. • When the power of independent variable is more than 1. • When the dependent variable is categorical (0/ 1, True/ False, Yes/ No, A/B/C) in nature. Y Y X X www.edureka.co/data-science Edureka’s Data Science Certification Training

  11. Why Logistic Regression? www.edureka.co/data-science Edureka’s Data Science Certification Training

  12. Why Logistic Regression? Whenever the outcome of the dependent variable (Y) is discrete, like 0 or 1, Yes or No, A, B or C, we use logistic regression. www.edureka.co/data-science Edureka’s Data Science Certification Training

  13. Why Not Linear Regression? Whenever the outcome of the dependent variable (Y) is discrete, like 0 or 1, Yes or No, A, B or C, we use logistic regression. Why can’t we use Linear Regression? www.edureka.co/data-science Edureka’s Data Science Certification Training

  14. Why Not Linear Regression? 1 Y-axis 0 X-axis Now since our value of Y will be between 0 and 1, the linear line has to be clipped at 0 and 1. www.edureka.co/data-science Edureka’s Data Science Certification Training

  15. Why Not Linear Regression? 1 Y-axis 0 X-axis With this, our resulting curve cannot be formulated into a single formula. We needed a new way to solve this kind of problem. Hence, we came up with Logistic Regression! www.edureka.co/data-science Edureka’s Data Science Certification Training

  16. Logistic Regression Curve LOGISTIC REGRESSION 1.2 1 0.8 0.6 The S Curve 0.4 0.2 0 0 1 2 3 4 5 6 7 8 9 10 www.edureka.co/data-science Edureka’s Data Science Certification Training

  17. Why Logistic Regression? Equation for a straight line Y = C + B1X1 + B2X2 + …. Range of Y is from – (infinity) to infinity www.edureka.co/data-science Edureka’s Data Science Certification Training

  18. Why Logistic Regression? Equation for a straight line Y = C + B1X1 + B2X2 + …. Range of Y is from – (infinity) to infinity Let’s try to reduce the Logistic Regression Equation from this equation Y = C + B1X1 + B2X2 + …. In Logistic Regression Y can only be between 0 and 1. www.edureka.co/data-science Edureka’s Data Science Certification Training

  19. Why Logistic Regression? Equation for a straight line Y = C + B1X1 + B2X2 + …. Range of Y is from – (infinity) to infinity Let’s try to reduce the Logistic Regression Equation from this equation Y = C + B1X1 + B2X2 + …. In Logistic Equation Y can only be between 0 and 1. Now, to get the range of Y between 0 and infinity, let’s transform Y Y=0 | 0 Y Now, we have the range between 0 and infinity 1 − Y Y=1 | infinity www.edureka.co/data-science Edureka’s Data Science Certification Training

  20. Why Logistic Regression? Equation for a straight line Y = C + B1X1 + B2X2 + …. Range of Y is from – (infinity) to infinity Let’s try to reduce the Logistic Regression Equation from this equation Y = C + B1X1 + B2X2 + …. In Logistic Equation Y can only be between 0 and 1. Now, to get the range of Y between 0 and infinity, let’s transform Y Y=0 | 0 Y Now, we have the range between 0 and infinity 1 − Y Y=1 | infinity Let us transform it further, to get the range between –( infinity ) and infinity Y ? = C + B1X1 + B2X2 + …. log log 1 − Y ? − ? www.edureka.co/data-science Edureka’s Data Science Certification Training

  21. What Is Logistic Regression? www.edureka.co/data-science Edureka’s Data Science Certification Training

  22. What Is Logistic Regression? Logistic regression, or logit regression, or logit model is a regression model where the dependent variable (DV) is categorical. Categorical Dependent Y = f(X) Variables that can have only fixed values such as A, B or C, Yes or No i.e Y is dependent on X. www.edureka.co/data-science Edureka’s Data Science Certification Training

  23. What Is Logistic Regression? Therefore, whenever the outcome of the dependent variable (Y) is categorical, like 0 or 1, Yes or No, A, B or C, we use logistic regression. 1.0 0.0 www.edureka.co/data-science Edureka’s Data Science Certification Training

  24. How Does Logistic Regression Work? www.edureka.co/data-science Edureka’s Data Science Certification Training

  25. How does Logistic Regression Work? Let us take an example to understand this: Selected 147, 120, 121, 128, 110, 119, 133 MODEL Not Selected 107, 89, 92, 106, 104, 114 www.edureka.co/data-science Edureka’s Data Science Certification Training

  26. How does Logistic Regression Work? Let us take an example to understand this: Selected 147, 120, 121, 128, 110, 119, 133 MODEL Not Selected 107, 89, 92, 106, 104, 114 www.edureka.co/data-science Edureka’s Data Science Certification Training

  27. How Does Logistic Regression Work? Let’s take a sample dataset in R, which is called mtcars. Our aim is to predict whether a car will have a V-engine or a Straight engine based on our inputs. Key Mpg - Miles/US Gallon Cyl – Number of cylinders Disp – Number of cylinders Hp – Gross horsepower Drat – Rear axle ratio Wt – Weight (lb/1000) Qsec – 1/4 mile time Vs – V Engine Am – Transmission Type Gear – Number of forward gears Carb - Number of carburetors www.edureka.co/data-science Edureka’s Data Science Certification Training

  28. How Does Logistic Regression Work? For now, let’s take disp and wt as our primary independent variables. Why? We’ll be discussing it in our next section. Key Mpg - Miles/US Gallon Cyl – Number of cylinders Disp – Number of cylinders Hp – Gross horsepower Drat – Rear axle ratio Wt – Weight (lb/1000) Qsec – 1/4 mile time Vs – V Engine Am – Transmission Type Gear – Number of forward gears Carb - Number of carburetors www.edureka.co/data-science Edureka’s Data Science Certification Training

  29. How Does Logistic Regression Work? Since our aim is to know which engine will fit, the engine will either be V – type or not, i.e either 1 or 0. Therefore, our dependent variable is Y. Key Mpg - Miles/US Gallon Cyl – Number of cylinders Disp – Number of cylinders Hp – Gross horsepower Drat – Rear axle ratio Wt – Weight (lb/1000) Qsec – 1/4 mile time Vs – V Engine Am – Transmission Type Gear – Number of forward gears Carb - Number of carburetors www.edureka.co/data-science Edureka’s Data Science Certification Training

  30. How Does Logistic Regression Work? Before creating the model, we divide our dataset into training and testing. Training Dataset 80 % 20% Testing Dataset www.edureka.co/data-science Edureka’s Data Science Certification Training

  31. How Does Logistic Regression Work? Training to create our model and testing to validate it. Create model from this Training Dataset 80 % www.edureka.co/data-science Edureka’s Data Science Certification Training

  32. How Does Logistic Regression Work? ?° Once the model is created we get the following outputs, which are calculated using MLE*. ?1 ?2 *Maximum Likelihood Estimation is a method of estimating the parameters. www.edureka.co/data-science Edureka’s Data Science Certification Training

  33. Estimated Regression Equation Estimated Regression Equation: ?2? ??° +?1? 1+ 2 Y Logit (Y) = log = 1 − Y ?2? 1 + ??° +?1? 1+ 2 Here, β°= Constant Coefficient ?1= Coefficient of x1 ?2= Coefficient of x2 ?1= Independent variable ?2= Independent variable e = Euler’s Number P(Y) = Probability that Y equals 1 www.edureka.co/data-science Edureka’s Data Science Certification Training

  34. How Does Logistic Regression Work? Let’s take a value from the test dataset β°= 1.83010 Substituting Values β1= 1.09428 β2= - 0.02529 0.9849 1.9849 = 0.4962 Logit (Y) = e = 2.7183 X1 = 120.3 We will assume the threshold to be 0.5 Probability of ‘vs’ being ‘1’ = 0.4962 X2 = 2.140 Hence our car will not have a VS engine and hence have a straight engine. www.edureka.co/data-science Edureka’s Data Science Certification Training

  35. How Does Logistic Regression Work? Let’s take a value from the test dataset β°= 1.83010 Substituting Values β1= 1.09428 β2= - 0.02529 0.9849 1.9849 = 0.4962 Logit (Y) = e = 2.7183 X1 = 120.3 We will assume the threshold to be 0.5 Probability of ‘vs’ being ‘1’ = 0.4962 X2 = 2.140 Hence our car will not have a VS engine and hence have a straight engine. www.edureka.co/data-science Edureka’s Data Science Certification Training

  36. Logistic Regression Demo In R www.edureka.co/data-science Edureka’s Data Science Certification Training

  37. Logistic Regression Demo In R Our aim is to predict whether a patient is diabetic or not based on the following values. Key Npreg – number of pregnancies Glu – plasma glucose concentration Bp – diastolic blood pressure Skin – triceps skin fold thickness Bmi – body mass index Ped – diabetes pedigree function Age – age in years Type – 1 for yes and 0 for No for diabetic www.edureka.co/data-science Edureka’s Data Science Certification Training

  38. Logistic Regression Demo In R First, we will read the data from our CSV file, by entering this command: www.edureka.co/data-science Edureka’s Data Science Certification Training

  39. Logistic Regression Demo In R First, we will read the data from our CSV file, by entering this command: Then, we will split our dataset into training and testing, with the ratio 8:2 www.edureka.co/data-science Edureka’s Data Science Certification Training

  40. Logistic Regression Demo In R First, we will read the data from our CSV file, by entering this command: Then, we will split our dataset into training and testing, with the ratio 8:2 After that we’ll create our model using the training dataset www.edureka.co/data-science Edureka’s Data Science Certification Training

  41. Logistic Regression Demo In R The summary of the model will give this. www.edureka.co/data-science Edureka’s Data Science Certification Training

  42. Logistic Regression Demo In R The summary of the model will give this. *** - 99.9% confident ** - 99% confident * - 95% confident . - 90% confident www.edureka.co/data-science Edureka’s Data Science Certification Training

  43. Logistic Regression Demo In R • This is the summary model that we get after improving our model. • So the insignificant fields is skin Null deviance shows how well the response variable is predicted by a model that includes only the intercept (grand mean) Residual deviance shows how well the response variable is predicted with inclusion of independent variables. www.edureka.co/data-science Edureka’s Data Science Certification Training

  44. Logistic Regression Demo In R We will the predict the values for the test dataset and then categorize them according to threshold which is 0.5 www.edureka.co/data-science Edureka’s Data Science Certification Training

  45. Logistic Regression Demo In R We will the predict the values for the test dataset and then categorize them according to threshold which is 0.5 Create Confusion Matrix for the training dataset www.edureka.co/data-science Edureka’s Data Science Certification Training

  46. Logistic Regression Demo In R We will the predict the values for the test dataset and then categorize them according to threshold which is 0.5 Create Confusion Matrix for the training dataset And then finding the accuracy www.edureka.co/data-science Edureka’s Data Science Certification Training

  47. How To Find The Threshold? www.edureka.co/data-science Edureka’s Data Science Certification Training

  48. Logistic Regression Demo In R Let us take a sample for our significant fields and see whether our patient is diabetic or not based on the model that we created. Store the predicted values for training dataset in ‘res’ variable www.edureka.co/data-science Edureka’s Data Science Certification Training

  49. Logistic Regression Demo In R Let us take a sample for our significant fields and see whether our patient is diabetic or not based on the model that we created. Store the predicted values for training dataset in ‘res’ variable Import the library for the ROCR package www.edureka.co/data-science Edureka’s Data Science Certification Training

  50. Logistic Regression Demo In R Let us take a sample for our significant fields and see whether our patient is diabetic or not based on the model that we created. Store the predicted values for training dataset in ‘res’ variable Import the library for the ROCR package Define the ‘ROCRPred’ and and ‘ROCRPerf’ variables www.edureka.co/data-science Edureka’s Data Science Certification Training

More Related