- 56 Views
- Uploaded on
- Presentation posted in: General

Welcome (back) to IST 380 !

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Welcome (back) to IST 380 !

Today: the old and the new

modeling trends from Twitter data

the most traditional approach to modeling data

This picture may soon become part of the OLD, if trends continueâ€¦

Assignmentsâ€¦

Homework #1 is complete! (2/5)

Getting started with R (tutorial + "quiz" + text)

Make sure you can submit to our submission site!

Zac & Suleng

Homework #2 is due tomorrow (2/12)

Pr #1: text, Chapters 6-9

Pr #2: Monty Hall challenge

Pr #3: writing a predictive model by handâ€¦

Homework #3 is due next Tuesday (2/20)

Pr #1: text, Chapter 10

Pr #2: the envelope, please!

Things are heating up here!

Pr #3: linear models for prediction

The age of data?

I prefer my data well-aged!

R path!

1

2

3

â€¦ R's toolset and its capabilitiesâ€¦

Programming Skills

data collection

descriptive vs. generative vs. predictive statistics

Subject Expertise

predictions using linear regression

I predict we'll get here, but not necessarily in a straight line!â€¦

packages

library

lapply

order

diff

Descriptive statistics: Twitter data

Tweet "diffs" for a certain hashtagâ€¦

Chapter 10 introduces access to Twitter data and statistical descriptions using these data

packages:

bitops

Rcurl

RJSONIO

later:

UsingR

Some R: library

Once you have installed these packages

You can ensure they're present with

library(bitops)

and so onâ€¦

Chapter 10 will have you write a function to automate this processâ€¦

What if I don't have hands?!

Caution! Some of these may have to be installed by handâ€¦

Some R: styleâ€¦

I have NO COMMENT about this function!

Some R: styleâ€¦

better, but not ideal

Some R: styleâ€¦

use variables to hold intermediate values!

Some R: lapply and vapply

Clock in Bristol, UK

Allow you to apply a function to every element of a list or a vector:

> L <- list(8,9,10)

> lapply( L, add1 )

[[1]]

[1] 9

[[2]]

[1] 10

[[3]]

[1] 11

lapply(X, FUN, ...)

> V <- 8:10

> vapply( V, add1, FUN.VALUE=42 )

[1] 9 10 11

vapply(X, FUN, FUN.VALUE ...)

UTC?

Clock in Bristol, UK

coordinated universal time

since before the railroadsâ€¦

red minute hand: Bristol

black minute hand: London (Greenwich)

Looking at the dataâ€¦

UTC?

can be plotted as-is

take differences via as.numeric

- so that "2013-02-11 20:55:03 UTC"

becomes 1360616103

Some R: order and diff

> V <- c(3,4,2,1)

> V

[1] 3 4 2 1

> order(V)

[1] 4 3 1 2

>

order(..., na.last = TRUE, decreasing = FALSE)

order returns a permutation of its inputâ€¦

What do these numbers mean?

Some R: order and diff

> V <- c(3,4,2,1)

> V

[1] 3 4 2 1

> order(V)

[1] 4 3 1 2

> V[order(V)]

[1] 1 2 3 4

order(..., na.last = TRUE, decreasing = FALSE)

order returns a permutation of its inputâ€¦

What do these numbers mean?

Why not just use sort?

You can, but this let's you order anything in the same way!

diff ?

Comparing tags?

#losangeles

#sanfransisco

Which is which?

Comparing tags?

#losangeles

#sanfrancisco

Which is which?

Comparing tags...

Next week: we will quantify these differences more carefullyâ€¦

#losangeles

#sanfrancisco

Which is which?

Generative statistics

rgeom

runif

rnorm â€¦

sample

replicate

distribution of samples of state populations

Chapter 7 reviews repeated sampling and the resulting distribution of means

Generative statistics

rgeom

runif

rnorm â€¦

sample

replicate

Monte Carlo method: run a process many times to gain insights into itâ€¦

distribution of samples of state populations

Chapter 7 reviews repeated sampling and the resulting distribution of means

Hw3 pr2: A second Monte Carlo example :

Both envelopes hold some positive amount of money (in a check or IOU), but one of these two envelopes holds twice as much money as the other.

Should you switch or stay?

Hw3 pr2: A second Monte Carlo example :

Switch!

Both envelopes hold some positive amount of money (in a check or IOU), but one of these two envelopes holds twice as much money as the other.

Should you switch or stay?

but, then, should you switch back?

Hw3 pr2: A second Monte Carlo example :

This week ~ write a function to model this processâ€¦

Both envelopes hold some positive amount of money (in a check or IOU), but one of these two envelopes holds twice as much money as the other.

Should you switch or stay?

Hw3 pr2

Write a Mystery Envelope function:

ME_once <- function( amount_found=1.0, sors="switch", verbose=TRUE)

â€¦ that runs one envelope trial

â€¦ and returns the amount of $ "earned"

Another to run it N times:

ME_ntimes <- function( n=100 )

And another to run it N times:

sample_ME <- function( run_me=100 )

Assignmentsâ€¦

Homework #1 is complete! (2/5)

Getting started with R (tutorial + "quiz" + text)

Make sure you can submit to our submission site!

Homework #2 is due tomorrow (2/12)

Pr #1: text, Chapters 6-9

Pr #2: Monty Hall challenge

Pr #3: writing a predictive model by handâ€¦

Homework #3 is due next Tuesday (2/20)

Pr #1: text, Chapter 10

Pr #2: the envelope, please!

Things are heating up here!

Pr #3: linear models for prediction

Big Ideas:

Predictive modeling

Linear regression

The human roleâ€¦ !

So, what is Machine Learning?

The goal of machine learning also known as

predictive statistics/analytics,

is to find a function

that yields outputs for previously-unseen inputsâ€¦

prediction: did the passenger survive?

passenger details

function

So, what is Machine Learning?

The goal of machine learning also known as

predictive statistics/analytics,

is to find a function

that yields outputs for previously-unseen inputsâ€¦

prediction: did the passenger survive?

passenger details

For Hw2, you are building this function by hand.

function

R is for Regression!

The oldest and (still) most popular technique for automatically generating a model from data.

problem 3 this weekâ€¦

Regression

What is it?

Regression ~ predictive modeling

this week: making an assumption of linear dependence on the inputs

But why is it called regression?

1877: "reversion" (peas)

1885: "regression" (people)

make this sum of squared errors (residuals) as small as possible

Let's look at lm1

pr3 this week: temperaturesâ€¦

Temperature anomalies

The dataâ€¦

deviations from the 1950-1980 global average of 14Â°C ~ 57.2Â°F

averaged (worldwide) and presented in units of 0.01Â°C

Your taskâ€¦

- follow an analysis plan similar to the Galton data in the previous slides
- fit a linear model to the yearly average data and to each month's average data
- use your model to predict what the average temperature will be for 2012 and 2013
- is the linear model a reasonable one?
- we'll check (or you canâ€¦) the prediction for 2012 (but not 2013, yet)

Try it!

Help is available either with hw#2 (Monty Hall and Titanic using R's functions)

or hw#3 (Twitter, envelopes, and temperatures)

this evening during lab timeâ€¦

Good luck with everything this week!

Lab !

The Titanic

April 15, 1912

1502 out of the 2224 passengers died in the sinking

What characteristics did the survivors share?

The Data

here are the 11 columns

There are 742 rows and 11 columns in the training data.

Our goal

â€¦ is to write a function that takes in a row of new data and outputs whether that passenger would survive (1) or not (0).

A first predictor

A second predictor

Does the data match the famous emergency cry?

Testing our functionsâ€¦

CS vs. IS and IT ?

greater integration system-wide issues

smaller details

machine specifics

www.acm.org/education/curric_vols/CC2005_Final_Report2.pdf

CS vs. IS and IT ?

Where will IS go?

CS vs. IS and IT ?

IT ?

Where will IT go?

IT ?

The bigger picture

Weeks 10-12

Objects

Weeks 13-15

Final Projects

Week 10

Week 13

classes vs. objects

final projects

Week 11

Week 14

methods and data

final projects

Week 12

Week 15

inheritance

final exam

Data?!

- Neighbor's name
- A place they consider home
- Are they working at a company now?
- How many U.S. states have they visited?
- Their favorite unhealthy foodâ€¦ ?
- Do they have any "Data Science" (statistics, machine learning, CS) background?

Where?

state remindersâ€¦

Data!

- Neighbor's name
- A place they consider home
- Are they working at a company now?
- How many U.S. states have they visited?
- Their favorite unhealthy foodâ€¦ ?
- Do they have any "Data Science" (statistics, machine learning, CS) background?

Zachary Dodds

Pittsburgh, PA

Harvey Mudd

Where?

44

M&Ms

mostly CS for meâ€¦

Data!

- Neighbor's name
- A place they consider home
- Are they working at a company now?
- How many U.S. states have they visited?
- Their favorite unhealthy foodâ€¦ ?
- Do they have any "Data Science" (statistics, machine learning, CS) background?

Zachary Dodds

Pittsburgh, PA

Harvey Mudd

Where?

44

M&Ms

This class is truly seminar-style: we're devloping expertise in this field together.

mostly CS for meâ€¦

be sure to set up your login + profile for the submission siteâ€¦