1 / 29

Lecture 2

Lecture 2. Descriptive statistics (Part I). Lecture 2: Descriptive statistics. Data in raw form are usually not easy to use for decision making Some type of organization is needed Table Graph Techniques reviewed here: Bar charts and pie charts Ordered array Stem-and-leaf display

strom
Download Presentation

Lecture 2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 2 Descriptive statistics (Part I)

  2. Lecture 2: Descriptive statistics • Data in raw form are usually not easy to use for decision making • Some type oforganizationis needed • Table • Graph • Techniques reviewed here: • Bar charts and pie charts • Ordered array • Stem-and-leaf display • Frequency distributions, histograms • Cumulative distributions • Contingency tables

  3. Tabulating and Graphing Univariate Categorical Data Categorical Data Graphing Data Tabulating Data Pie Charts Summary Table Bar Charts

  4. Summary Table(for an Investor’s Portfolio) Investment CategoryAmountPercentage (in thousands $) Stocks 46.5 42.27 Bonds 32 29.09 CD 15.5 14.09 Savings 16 14.55 Total 110 100 Variables are Categorical

  5. Bar Chart(for an Investor’s Portfolio)

  6. Pie Chart(for an Investor’s Portfolio) Amount Invested in K$ Savings 15% Stocks 42% CD 14% Percentages are rounded to the nearest percent Bonds 29%

  7. Organizing Numerical Data Numerical Data 41, 24, 32, 26, 27, 27, 30, 24, 38, 21 Frequency Distributions & Cumulative Distributions Ordered Array Stemand Leaf Display 2144677 3 028 41 21, 24, 24, 26, 27, 27, 30, 32, 38, 41 Histograms Tables

  8. The Ordered Array • Data in raw form (as collected): • 24, 26, 24, 21, 27, 27, 30, 41, 32, 38 • Data inordered array from smallest to largest:21, 24, 24, 26, 27, 27, 30, 32, 38, 41 • Shows range (min to max) • May help identify outliers (unusual observations) • If the data set is large, the ordered array is less useful

  9. Stem-and-Leaf Display • A simple way to see distribution details in a data set METHOD: Separate the sorted data series into leading digits (the stem) and the trailing digits (theleaves)

  10. Example • Data in Raw Form (as Collected): 24, 26, 24, 21, 27, 27, 30, 41, 32, 38 • Data inOrdered Array from Smallest to Largest:21, 24, 24, 26, 27, 27, 30, 32, 38, 41 • Stem-and-Leaf Display: 2 1 4 4 6 7 7 3 0 2 8 41

  11. Tabulating Numerical Data: Frequency Distributions What is a Frequency Distribution? • A frequency distribution is a list or a table … • containing class groupings (ranges within which the data fall) ... • and the corresponding frequencies with which data fall within each grouping or category • It allows for a quick visual interpretation of the data

  12. Tabulating Numerical Data: Frequency Distributions Example: A manufacturer of insulation randomly selects 20 winter days and records the daily high temperature 24, 35, 17, 21, 24, 37, 26, 46, 58, 30, 32, 13, 12, 38, 41, 43, 44, 27, 53, 27

  13. Sort Raw Data on days in Ascending Order12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58 • Find Range: 58 - 12 = 46 • Select Number of Classes: 5(usually between 5 and 15) • Compute Class Interval (Width): 10 (46/5 then round up) • Determine Class Boundaries (Limits):10, 20, 30, 40, 50, 60 • Count Observations & Assign to Classes

  14. Frequency Distributions, Relative Frequency Distributions and Percentage Distributions Data in Ordered Array: 12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58 Relative Frequency Percentage Class Frequency [10, 20) 3 .15 15 [20, 30) 6 .30 30 [30, 40) 5 .25 25 [40, 50) 4 .20 20 [50, 60) 2 .10 10 Total 20 1 100

  15. Graphing Numerical Data: The Histogram • A graph of the data in a frequency distribution is called a histogram • The class boundaries(orclass midpoints) are shown on the horizontal axis • the vertical axis is eitherfrequency, relative frequency, or percentage • Bars of the appropriate heights are used to represent the number of observations within each class

  16. Histogram Example Class Midpoint Class Frequency [10, 20) 15 3 [20, 30) 25 6 [30, 40) 35 5 [40, 50) 45 4 [50, 60) 55 2 (No gaps between bars) Class Midpoints

  17. Tabulating Numerical Data: Cumulative Frequency Data in Ordered Array: 12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58 Upper Cumulative Cumulative Limit Frequency % Frequency 10 0 0 20 3 15 30 9 45 40 14 70 50 18 90 60 20 100

  18. Two categorical variables (contingency table) • The following data represent the responses to a question asked in a survey of 20 college students majoring in business – • What is your gender? (Male = M; Female = F) • What is your major? (Accountancy = A; Information System = I; Market = M) Gender: M M M F M F F M F M F M M M M F F M F F Major: A I I M A I A A I I A A A M I M A A A I

  19. Contingency table (cont’d) • Raw data set: Gender: M M M F M F F M F M F M M M M F F M F F Major: A I I M A I A A I I A A A M I M A A A I

  20. Graphical methods are: • Good in presenting data • Not easy for comparison • Difficult to use for statistical inference

  21. Numerical description Summary Measures Variation Central Tendency (location measures) Quartiles Range Mean Median Mode Variance Interquartile range Standard Deviation

  22. Mean • Mean (Arithmetic Mean) of Data Values • Sample mean • Population mean Sample Size Population Size

  23. An example • TV watching hours/week: 5, 7, 3, 38, 7 • Mean = (5 + 7 + 3 + 38 + 7)/5 = 60/5 = 12 • If the correct time for 4th subject is 8 (not 38) • Mean = (5 + 7 + 3 + 8 + 7)/5 = 30/5 = 6 3 5 6 7 8 3 5 7 12 38 Mean = 6 Mean = 12

  24. Mean (Cont’d) • The Most Common Measure of Central Tendency especially when n is large due to its good theoretical properties • Affected by Extreme Values (Outliers)

  25. Median • Robust measure of central tendency • Not affected by extreme values • In an ordered array, the median is the ‘middle’ number • If n is odd, the median is the middle number (i.e,(n+1)/2 th measurement) • If n is even, the median is the average of the n/2 th and (n/2 +1) th measurement 3 5 7 8 3 5 7 38 Median = 7 Median = 7

  26. Mode • A Measure of Central Tendency • Value that Occurs Most Often • Not Affected by Extreme Values • There May Not Be a Mode • There May Be Several Modes • Used for Either Numerical or Categorical Data 0 1 2 3 4 5 6 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 No Mode Mode = 9

  27. Quartiles • Split ordered data into 4 quarters • Position of i-th quartile • (1st quartile) and (3rd quartile) are measures of Noncentral Location • are called 25th, 50th, and 75th percentile respectively. A pth percentile is the value of X such that p% of the measurements are less than X and (100-p)% are greater than X. 25% 25% 25% 25%

  28. Quartiles (example) Data in Ordered Array: 3 6 6 12 12 12 15 15 18 21 • Position of first quartile is • Position of third quartile is

  29. 5-number summary • Box-and-Whisker Plot • Graphical display of data using 5-numbers Data in Ordered Array: 3 6 6 12 12 12 15 15 18 21 Median( ) X X largest smallest 21 6 3 12 15.75

More Related