Statistics and data analysis
Download
1 / 35

Statistics and Data Analysis - PowerPoint PPT Presentation


  • 56 Views
  • Uploaded on

Statistics and Data Analysis. Professor William Greene Stern School of Business IOMS Department Department of Economics. Statistics and Data Analysis. Part 1 – Data Presentation. Data Presentation Agenda. Data and Data Types Representing Data: pie chart, bar chart.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Statistics and Data Analysis' - duscha


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Statistics and data analysis

Statistics and Data Analysis

Professor William Greene

Stern School of Business

IOMS Department

Department of Economics


Statistics and data analysis1

Statistics and Data Analysis

Part 1 – Data Presentation


Data presentation agenda
Data Presentation Agenda

  • Data and Data Types

  • Representing Data: pie chart, bar chart.

  • Summarizing Data: box plot, histogram

    • Central tendency

    • Spread

    • Distribution (shape)


Data a set of facts a picture of some aspect of the world
Data = A Set of FactsA picture of some aspect of the world

Pizza Sales by Type

What do the data tell you?

How can you use the information?

What additional information would make these data more informative?


Data types and measurement
Data Types and Measurement

  • Quantitative

    • Discrete = count: Number of car accidents by city by time

    • Continuous = measurement: Housing prices

  • Qualitative

    • Categorical: Shopping mall, car brand, trip mode

    • Ordinal: Survey data on attitudes; “How do you feel about…?”

      Strongly disagree  Disagree  Neutral  Agree  Strongly agree

      Moody’s bond ratings: Aaa, Aa, A, Bbb, Bb, B, and so on.

  • Frameworks

    • Cross section

    • Time series


Discrete data us crime statistics counts of occurrences
Discrete Data – US Crime Statistics; Counts of Occurrences.


Continuous data housing prices and incomes
Continuous DataHousing Prices and Incomes


Unordered qualitative data travel mode between sydney and melbourne by 210 travelers
Unordered Qualitative DataTravel Mode Between Sydney and Melbourne by 210 Travelers


Ordered Qualitative DataGerman Health Satisfaction Survey; 27,326 individuals. On a scale from 0 to 10, how do you feel about your health?


Ordered Qualitative Outcomes

Bond Ratings Movie Ratings


Problem with ordered survey response data
Problem with Ordered Survey Response Data

61 Stern Students’ Ranking of Subway Safety (1994)*

Very Unsatisfactory

Unsatisfactory

OK

Satisfactory

Very Satisfactory

Is there an objective meaning to “3” on some standard scale?Does everyone’s “1” or “2” or “3” … mean the same thing?

* Jeff Simonoff: Data Presentation and Summary, pp. 3-4


Quantitative vs qualitative data
Quantitative vs. Qualitative Data

Qualitative Data:

No units of measurement

Arithmetic manipulation is usually meaningless. The average of Air and Bus is not Train

Quantitative Data:

Units of measurement make sense. Arithmetic computations make sense.


Cross section data housing prices and incomes
Cross Section DataHousing Prices and Incomes


Time series data car thefts
Time Series Data: Car Thefts


Representing data
Representing Data

  • In raw form

  • Transformed to a visual form

  • Summarized graphically

  • Summarized statistically


Pie chart
Pie Chart

Pizza Pies Sold, by Type


Data representation
Data Representation

BAR CHART PIE CHART

Same data. Which is easier to understand?



Raw data on housing prices and incomes
Raw Data on Housing Prices and Incomes


A box plot describes the distribution of values in a set of data
A Box Plot Describes the Distributionof Values in a Set of Data

Hawaii

Box and Whisker Plot for House Price Listings


Making a box plot for per capita income
Making a Box Plot for Per Capita Income

Maximum=31136

3rdQuartile = 24933

Interquartile Range = IQR= 24933-21677 = 3256

Median=22610

1stQuartile = 21677

Minimum=17043


Box and whisker plot
Box and Whisker Plot

What is an outlier?Why do we believe a particular point is an outlier?

Outliers

Smaller of (Maximum, Median + 1.5 IQR

75th Percentile

Interquartile range=IQR

Median

25th Percentile

Larger of (Minimum, Median – 1.5 IQR

HOG, pp. 39-43


A frequency distribution
A Frequency Distribution


Histogram for house price listings
Histogramfor House Price Listings

A histogram describes the sample data and suggests the nature of the underlying data generating process. Note the “skewness” of the distribution of listings.

HOG, pp. 16-18


Distribution of house price listings
Distribution of House Price Listings

… shows up in the box and whisker plot. Note the long whisker at the top of the figure.

Asymmetry (skewness) in the histogram of listing prices…


A caution about graphical data summaries
A Caution About Graphical Data Summaries

Graphical tools can be very badly behaved when:

(1) The data have only a few observations.

(2) There are wild observations in the data set.

The box and whisker plot is distorted (and dominated) by one wildly errant observation.


Summary
Summary

  • What story does the data presentation tell?

    • Data in raw form tell no story.

    • Visual representation of data tells something about the data

  • Data reduction and summary representation: What do we learn?

    • Location

    • Spread

    • Shape of the distribution

  • What tool is most informative?

    • Reduction to a small number of features

    • Visual displays of data

      • Pie chart

      • Box and whisker plots

      • Histograms

      • Time series plots

“There are lies, damned lies and statistics.” (Benjamin Disraeli)


The visual data do tell the story napoleon s march to moscow
The Visual Data Do Tell the Story:Napoleon’s March to Moscow




Probability of Survival to Age 50, Female at BirthU.S. and 20 Other Wealthy Countries


ad