Welcome (back) to IST 380 !
Download
1 / 83

Welcome (back) to IST 380 ! - PowerPoint PPT Presentation


  • 86 Views
  • Uploaded on

Welcome (back) to IST 380 !. Today: the old and the new. modeling trends from Twitter data. the most traditional approach to modeling data. This picture may soon become part of the OLD, if trends continue…. Assignments…. Homework #1 is complete! (2/5).

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Welcome (back) to IST 380 !' - jacqui


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Welcome (back) to IST 380 !

Today: the old and the new

modeling trends from Twitter data

the most traditional approach to modeling data

This picture may soon become part of the OLD, if trends continue…


Assignments…

Homework #1 is complete! (2/5)

Getting started with R (tutorial + "quiz" + text)

Make sure you can submit to our submission site!

Zac & Suleng

Homework #2 is due tomorrow (2/12)

Pr #1: text, Chapters 6-9

Pr #2: Monty Hall challenge

Pr #3: writing a predictive model by hand…

Homework #3 is due next Tuesday (2/20)

Pr #1: text, Chapter 10

Pr #2: the envelope, please!

Things are heating up here!

Pr #3: linear models for prediction


The age of data?

I prefer my data well-aged!


R path!

1

2

3

… R's toolset and its capabilities…

Programming Skills

data collection

descriptive vs. generative vs. predictive statistics

Subject Expertise

predictions using linear regression

I predict we'll get here, but not necessarily in a straight line!…


packages

library

lapply

order

diff

Descriptive statistics: Twitter data

Tweet "diffs" for a certain hashtag…

Chapter 10 introduces access to Twitter data and statistical descriptions using these data


packages:

bitops

Rcurl

RJSONIO

twitteR

later:

UsingR

Some R: library

Once you have installed these packages

You can ensure they're present with

library(bitops)

and so on…

Chapter 10 will have you write a function to automate this process…

What if I don't have hands?!

Caution! Some of these may have to be installed by hand…


Some R: style…

I have NO COMMENT about this function!


Some R: style…

better, but not ideal


Some R: style…

use variables to hold intermediate values!


Some R: lapply and vapply

Clock in Bristol, UK

Allow you to apply a function to every element of a list or a vector:

> L <- list(8,9,10)

> lapply( L, add1 )

[[1]]

[1] 9

[[2]]

[1] 10

[[3]]

[1] 11

lapply(X, FUN, ...)

> V <- 8:10

> vapply( V, add1, FUN.VALUE=42 )

[1] 9 10 11

vapply(X, FUN, FUN.VALUE ...)


UTC?

Clock in Bristol, UK

coordinated universal time

since before the railroads…

red minute hand: Bristol

black minute hand: London (Greenwich)



UTC?

can be plotted as-is

take differences via as.numeric

- so that "2013-02-11 20:55:03 UTC"

becomes 1360616103


Some R: order and diff

> V <- c(3,4,2,1)

> V

[1] 3 4 2 1

> order(V)

[1] 4 3 1 2

>

order(..., na.last = TRUE, decreasing = FALSE)

order returns a permutation of its input…

What do these numbers mean?


Some R: order and diff

> V <- c(3,4,2,1)

> V

[1] 3 4 2 1

> order(V)

[1] 4 3 1 2

> V[order(V)]

[1] 1 2 3 4

order(..., na.last = TRUE, decreasing = FALSE)

order returns a permutation of its input…

What do these numbers mean?

Why not just use sort?

You can, but this let's you order anything in the same way!

diff ?


Comparing tags?

#losangeles

#sanfransisco

Which is which?


Comparing tags?

#losangeles

#sanfrancisco

Which is which?


Comparing tags...

Next week: we will quantify these differences more carefully…

#losangeles

#sanfrancisco

Which is which?


Generative statistics

rgeom

runif

rnorm …

sample

replicate

distribution of samples of state populations

Chapter 7 reviews repeated sampling and the resulting distribution of means


Generative statistics

rgeom

runif

rnorm …

sample

replicate

Monte Carlo method: run a process many times to gain insights into it…

distribution of samples of state populations

Chapter 7 reviews repeated sampling and the resulting distribution of means


Hw3 pr2: A second Monte Carlo example :

Both envelopes hold some positive amount of money (in a check or IOU), but one of these two envelopes holds twice as much money as the other.

Should you switch or stay?


Hw3 pr2: A second Monte Carlo example :

Switch!

Both envelopes hold some positive amount of money (in a check or IOU), but one of these two envelopes holds twice as much money as the other.

Should you switch or stay?

but, then, should you switch back?


Hw3 pr2: A second Monte Carlo example :

This week ~ write a function to model this process…

Both envelopes hold some positive amount of money (in a check or IOU), but one of these two envelopes holds twice as much money as the other.

Should you switch or stay?


Hw3 pr2

Write a Mystery Envelope function:

ME_once <- function( amount_found=1.0, sors="switch", verbose=TRUE)

… that runs one envelope trial

… and returns the amount of $ "earned"

Another to run it N times:

ME_ntimes <- function( n=100 )

And another to run it N times:

sample_ME <- function( run_me=100 )


Assignments…

Homework #1 is complete! (2/5)

Getting started with R (tutorial + "quiz" + text)

Make sure you can submit to our submission site!

Homework #2 is due tomorrow (2/12)

Pr #1: text, Chapters 6-9

Pr #2: Monty Hall challenge

Pr #3: writing a predictive model by hand…

Homework #3 is due next Tuesday (2/20)

Pr #1: text, Chapter 10

Pr #2: the envelope, please!

Things are heating up here!

Pr #3: linear models for prediction


Big Ideas:

Predictive modeling

Linear regression

The human role… !


So, what is Machine Learning?

The goal of machine learning also known as

predictive statistics/analytics,

is to find a function

that yields outputs for previously-unseen inputs…

prediction: did the passenger survive?

passenger details

function


So, what is Machine Learning?

The goal of machine learning also known as

predictive statistics/analytics,

is to find a function

that yields outputs for previously-unseen inputs…

prediction: did the passenger survive?

passenger details

For Hw2, you are building this function by hand.

function


R is for Regression!

The oldest and (still) most popular technique for automatically generating a model from data.

problem 3 this week…


Regression

What is it?


Regression ~ predictive modeling

this week: making an assumption of linear dependence on the inputs


But why is it called regression?

1877: "reversion" (peas)

1885: "regression" (people)



Let's look at possiblelm1


pr3 this week: possibletemperatures…


Temperature possibleanomalies


The data… possible

deviations from the 1950-1980 global average of 14°C ~ 57.2°F

averaged (worldwide) and presented in units of 0.01°C


Your task… possible

  • follow an analysis plan similar to the Galton data in the previous slides

  • fit a linear model to the yearly average data and to each month's average data

  • use your model to predict what the average temperature will be for 2012 and 2013

  • is the linear model a reasonable one?

  • we'll check (or you can…) the prediction for 2012 (but not 2013, yet)


Try it! possible

Help is available either with hw#2 (Monty Hall and Titanic using R's functions)

or hw#3 (Twitter, envelopes, and temperatures)

this evening during lab time…

Good luck with everything this week!


Lab ! possible


The Titanic possible

April 15, 1912

1502 out of the 2224 passengers died in the sinking

What characteristics did the survivors share?


The Data possible

here are the 11 columns

There are 742 rows and 11 columns in the training data.


Our goal possible

… is to write a function that takes in a row of new data and outputs whether that passenger would survive (1) or not (0).



A second predictor possible

Does the data match the famous emergency cry?



CS vs. IS and IT ? possible

greater integration system-wide issues

smaller details

machine specifics

www.acm.org/education/curric_vols/CC2005_Final_Report2.pdf


CS vs. IS and IT ? possible

Where will IS go?



IT ? possible

Where will IT go?


IT ? possible


The bigger picture possible

Weeks 10-12

Objects

Weeks 13-15

Final Projects

Week 10

Week 13

classes vs. objects

final projects

Week 11

Week 14

methods and data

final projects

Week 12

Week 15

inheritance

final exam


Data?! possible

  • Neighbor's name

  • A place they consider home

  • Are they working at a company now?

  • How many U.S. states have they visited?

  • Their favorite unhealthy food… ?

  • Do they have any "Data Science" (statistics, machine learning, CS) background?

Where?



Data! possible

  • Neighbor's name

  • A place they consider home

  • Are they working at a company now?

  • How many U.S. states have they visited?

  • Their favorite unhealthy food… ?

  • Do they have any "Data Science" (statistics, machine learning, CS) background?

Zachary Dodds

Pittsburgh, PA

Harvey Mudd

Where?

44

M&Ms

mostly CS for me…


Data! possible

  • Neighbor's name

  • A place they consider home

  • Are they working at a company now?

  • How many U.S. states have they visited?

  • Their favorite unhealthy food… ?

  • Do they have any "Data Science" (statistics, machine learning, CS) background?

Zachary Dodds

Pittsburgh, PA

Harvey Mudd

Where?

44

M&Ms

This class is truly seminar-style: we're devloping expertise in this field together.

mostly CS for me…

be sure to set up your login + profile for the submission site…


ad