1 / 8

Algorithms for Data Analytics

Algorithms for Data Analytics. B.Ramamurthy. Data Analytics (Data Science). Intuition/ understanding. Big-data analytics StatsAlgs. Data. *. Discoveries/ intelligence. Statistical Inference. EDA. Decisions/ Answers/ Results. *. Three T ypes of Data Science Algorithms.

daire
Download Presentation

Algorithms for Data Analytics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Algorithms for Data Analytics B.Ramamurthy

  2. Data Analytics (Data Science) Intuition/ understanding Big-data analytics StatsAlgs Data * Discoveries/ intelligence Statistical Inference EDA Decisions/ Answers/ Results *

  3. Three Types of Data Science Algorithms • Pipelines to prepare data • Three types: • Data preparation algorithms such as sorting, workflows • Optimization algorithms stochastic gradient descent, least squares… • Machine learning algorithms…

  4. Machine Learning Algorithms • Comes from Artificial Intelligence • No underlying generative process • Build to predict or classify something • Three basic algorithms: • linear regression, k-nn, k-means • We already looked at linear regression as a case study for R/Rstudio • We will start with k-means…

  5. K-means • K-means is unsupervised: no prior knowledge of the “right answer” • Goal of the algorithm Is to determine the definition of the right answer by finding clusters of data • Kind of satisfaction survey data, incident report data, • Assume data {age, gender, income, state, household, size}, your goal is to segment the users. • K-means is the simplest of the clustering algorithms. • Lets understand kmeans using an example.

  6. Lets examine an example • {Age, income range, education, skills, social, paid work} • Lets take just the age { 23, 25, 24, 23, 21, 31, 32, 30,31, 30, 37, 35, 38, 37, 39, 42, 43, 45, 43, 45} • Classify this data using K-means • Lets assume K = 3 or 3 groups • Give me a guess of the centroids? Lets assume initial value of centroids to {21, 30, 40} • First lets hand calculate and then use R-Studio

  7. K-NN • Supervised ML • You know the “right answers” or at least data that is “labeled”: training set • Set of objects have been classified or labeled (training set) • Another set of objects are yet to be labeled or classified (test set) • Your goal is to automate the processes of labeling the test set. • Intuition behind k-NN is to consider most similar items --- similarity defined by their attributes, look at the existing label and assign the object a label.

  8. Lets look at an example

More Related