Welcome (back) to IST 380 !
This presentation is the property of its rightful owner.
Sponsored Links
1 / 83

Welcome (back) to IST 380 ! PowerPoint PPT Presentation


  • 56 Views
  • Uploaded on
  • Presentation posted in: General

Welcome (back) to IST 380 !. Today: the old and the new. modeling trends from Twitter data. the most traditional approach to modeling data. This picture may soon become part of the OLD, if trends continue…. Assignments…. Homework #1 is complete! (2/5).

Download Presentation

Welcome (back) to IST 380 !

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Welcome back to ist 380

Welcome (back) to IST 380 !

Today: the old and the new

modeling trends from Twitter data

the most traditional approach to modeling data

This picture may soon become part of the OLD, if trends continue…


Welcome back to ist 380

Assignments…

Homework #1 is complete! (2/5)

Getting started with R (tutorial + "quiz" + text)

Make sure you can submit to our submission site!

Zac & Suleng

Homework #2 is due tomorrow (2/12)

Pr #1: text, Chapters 6-9

Pr #2: Monty Hall challenge

Pr #3: writing a predictive model by hand…

Homework #3 is due next Tuesday (2/20)

Pr #1: text, Chapter 10

Pr #2: the envelope, please!

Things are heating up here!

Pr #3: linear models for prediction


Welcome back to ist 380

The age of data?

I prefer my data well-aged!


Welcome back to ist 380

R path!

1

2

3

… R's toolset and its capabilities…

Programming Skills

data collection

descriptive vs. generative vs. predictive statistics

Subject Expertise

predictions using linear regression

I predict we'll get here, but not necessarily in a straight line!…


Welcome back to ist 380

packages

library

lapply

order

diff

Descriptive statistics: Twitter data

Tweet "diffs" for a certain hashtag…

Chapter 10 introduces access to Twitter data and statistical descriptions using these data


Welcome back to ist 380

packages:

bitops

Rcurl

RJSONIO

twitteR

later:

UsingR

Some R: library

Once you have installed these packages

You can ensure they're present with

library(bitops)

and so on…

Chapter 10 will have you write a function to automate this process…

What if I don't have hands?!

Caution! Some of these may have to be installed by hand…


Welcome back to ist 380

Some R: style…

I have NO COMMENT about this function!


Welcome back to ist 380

Some R: style…

better, but not ideal


Welcome back to ist 380

Some R: style…

use variables to hold intermediate values!


Welcome back to ist 380

Some R: lapply and vapply

Clock in Bristol, UK

Allow you to apply a function to every element of a list or a vector:

> L <- list(8,9,10)

> lapply( L, add1 )

[[1]]

[1] 9

[[2]]

[1] 10

[[3]]

[1] 11

lapply(X, FUN, ...)

> V <- 8:10

> vapply( V, add1, FUN.VALUE=42 )

[1] 9 10 11

vapply(X, FUN, FUN.VALUE ...)


Welcome back to ist 380

UTC?

Clock in Bristol, UK

coordinated universal time

since before the railroads…

red minute hand: Bristol

black minute hand: London (Greenwich)


Welcome back to ist 380

Looking at the data…


Welcome back to ist 380

UTC?

can be plotted as-is

take differences via as.numeric

- so that "2013-02-11 20:55:03 UTC"

becomes 1360616103


Welcome back to ist 380

Some R: order and diff

> V <- c(3,4,2,1)

> V

[1] 3 4 2 1

> order(V)

[1] 4 3 1 2

>

order(..., na.last = TRUE, decreasing = FALSE)

order returns a permutation of its input…

What do these numbers mean?


Welcome back to ist 380

Some R: order and diff

> V <- c(3,4,2,1)

> V

[1] 3 4 2 1

> order(V)

[1] 4 3 1 2

> V[order(V)]

[1] 1 2 3 4

order(..., na.last = TRUE, decreasing = FALSE)

order returns a permutation of its input…

What do these numbers mean?

Why not just use sort?

You can, but this let's you order anything in the same way!

diff ?


Welcome back to ist 380

Comparing tags?

#losangeles

#sanfransisco

Which is which?


Welcome back to ist 380

Comparing tags?

#losangeles

#sanfrancisco

Which is which?


Welcome back to ist 380

Comparing tags...

Next week: we will quantify these differences more carefully…

#losangeles

#sanfrancisco

Which is which?


Welcome back to ist 380

Generative statistics

rgeom

runif

rnorm …

sample

replicate

distribution of samples of state populations

Chapter 7 reviews repeated sampling and the resulting distribution of means


Welcome back to ist 380

Generative statistics

rgeom

runif

rnorm …

sample

replicate

Monte Carlo method: run a process many times to gain insights into it…

distribution of samples of state populations

Chapter 7 reviews repeated sampling and the resulting distribution of means


Welcome back to ist 380

Hw3 pr2: A second Monte Carlo example :

Both envelopes hold some positive amount of money (in a check or IOU), but one of these two envelopes holds twice as much money as the other.

Should you switch or stay?


Welcome back to ist 380

Hw3 pr2: A second Monte Carlo example :

Switch!

Both envelopes hold some positive amount of money (in a check or IOU), but one of these two envelopes holds twice as much money as the other.

Should you switch or stay?

but, then, should you switch back?


Welcome back to ist 380

Hw3 pr2: A second Monte Carlo example :

This week ~ write a function to model this process…

Both envelopes hold some positive amount of money (in a check or IOU), but one of these two envelopes holds twice as much money as the other.

Should you switch or stay?


Welcome back to ist 380

Hw3 pr2

Write a Mystery Envelope function:

ME_once <- function( amount_found=1.0, sors="switch", verbose=TRUE)

… that runs one envelope trial

… and returns the amount of $ "earned"

Another to run it N times:

ME_ntimes <- function( n=100 )

And another to run it N times:

sample_ME <- function( run_me=100 )


Welcome back to ist 380

Assignments…

Homework #1 is complete! (2/5)

Getting started with R (tutorial + "quiz" + text)

Make sure you can submit to our submission site!

Homework #2 is due tomorrow (2/12)

Pr #1: text, Chapters 6-9

Pr #2: Monty Hall challenge

Pr #3: writing a predictive model by hand…

Homework #3 is due next Tuesday (2/20)

Pr #1: text, Chapter 10

Pr #2: the envelope, please!

Things are heating up here!

Pr #3: linear models for prediction


Welcome back to ist 380

Big Ideas:

Predictive modeling

Linear regression

The human role… !


Welcome back to ist 380

So, what is Machine Learning?

The goal of machine learning also known as

predictive statistics/analytics,

is to find a function

that yields outputs for previously-unseen inputs…

prediction: did the passenger survive?

passenger details

function


Welcome back to ist 380

So, what is Machine Learning?

The goal of machine learning also known as

predictive statistics/analytics,

is to find a function

that yields outputs for previously-unseen inputs…

prediction: did the passenger survive?

passenger details

For Hw2, you are building this function by hand.

function


Welcome back to ist 380

R is for Regression!

The oldest and (still) most popular technique for automatically generating a model from data.

problem 3 this week…


Welcome back to ist 380

Regression

What is it?


Welcome back to ist 380

Regression ~ predictive modeling

this week: making an assumption of linear dependence on the inputs


Welcome back to ist 380

But why is it called regression?

1877: "reversion" (peas)

1885: "regression" (people)


Welcome back to ist 380

make this sum of squared errors (residuals) as small as possible


Welcome back to ist 380

Let's look at lm1


Welcome back to ist 380

pr3 this week: temperatures…


Welcome back to ist 380

Temperature anomalies


Welcome back to ist 380

The data…

deviations from the 1950-1980 global average of 14°C ~ 57.2°F

averaged (worldwide) and presented in units of 0.01°C


Welcome back to ist 380

Your task…

  • follow an analysis plan similar to the Galton data in the previous slides

  • fit a linear model to the yearly average data and to each month's average data

  • use your model to predict what the average temperature will be for 2012 and 2013

  • is the linear model a reasonable one?

  • we'll check (or you can…) the prediction for 2012 (but not 2013, yet)


Welcome back to ist 380

Try it!

Help is available either with hw#2 (Monty Hall and Titanic using R's functions)

or hw#3 (Twitter, envelopes, and temperatures)

this evening during lab time…

Good luck with everything this week!


Welcome back to ist 380

Lab !


Welcome back to ist 380

The Titanic

April 15, 1912

1502 out of the 2224 passengers died in the sinking

What characteristics did the survivors share?


Welcome back to ist 380

The Data

here are the 11 columns

There are 742 rows and 11 columns in the training data.


Welcome back to ist 380

Our goal

… is to write a function that takes in a row of new data and outputs whether that passenger would survive (1) or not (0).


Welcome back to ist 380

A first predictor


Welcome back to ist 380

A second predictor

Does the data match the famous emergency cry?


Welcome back to ist 380

Testing our functions…


Welcome back to ist 380

CS vs. IS and IT ?

greater integration system-wide issues

smaller details

machine specifics

www.acm.org/education/curric_vols/CC2005_Final_Report2.pdf


Welcome back to ist 380

CS vs. IS and IT ?

Where will IS go?


Welcome back to ist 380

CS vs. IS and IT ?


Welcome back to ist 380

IT ?

Where will IT go?


Welcome back to ist 380

IT ?


Welcome back to ist 380

The bigger picture

Weeks 10-12

Objects

Weeks 13-15

Final Projects

Week 10

Week 13

classes vs. objects

final projects

Week 11

Week 14

methods and data

final projects

Week 12

Week 15

inheritance

final exam


Welcome back to ist 380

Data?!

  • Neighbor's name

  • A place they consider home

  • Are they working at a company now?

  • How many U.S. states have they visited?

  • Their favorite unhealthy food… ?

  • Do they have any "Data Science" (statistics, machine learning, CS) background?

Where?


Welcome back to ist 380

state reminders…


Welcome back to ist 380

Data!

  • Neighbor's name

  • A place they consider home

  • Are they working at a company now?

  • How many U.S. states have they visited?

  • Their favorite unhealthy food… ?

  • Do they have any "Data Science" (statistics, machine learning, CS) background?

Zachary Dodds

Pittsburgh, PA

Harvey Mudd

Where?

44

M&Ms

mostly CS for me…


Welcome back to ist 380

Data!

  • Neighbor's name

  • A place they consider home

  • Are they working at a company now?

  • How many U.S. states have they visited?

  • Their favorite unhealthy food… ?

  • Do they have any "Data Science" (statistics, machine learning, CS) background?

Zachary Dodds

Pittsburgh, PA

Harvey Mudd

Where?

44

M&Ms

This class is truly seminar-style: we're devloping expertise in this field together.

mostly CS for me…

be sure to set up your login + profile for the submission site…


  • Login