1 / 13

Organizing & Reporting Data: An Intro

This text provides an introduction to statistical analysis and how it works with data sets. It covers topics such as data organization, data management tasks, transforming data, and representing data distributions graphically.

joemichel
Download Presentation

Organizing & Reporting Data: An Intro

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Organizing & Reporting Data: An Intro Statistical analysis works with data sets A collection of data values on some variables recorded on a number cases (records) For example, the student data from last week:

  2. Organizing & Reporting Data (cont.): Structure of most data sets = “rectangular Columns = Variables Rows = Cases Cells = individual values

  3. Managing Data: Basic Tasks NOTE: Reliance on Codebook for Data Set Specify information about variables in the data set Indicate Variable Names & Labels Indicate Variable Values (codes) & Value Labels Indicates “missing values” Can Modify Overall Arrangement of Data Set Sorting Change the order of the cases in the file Selecting  identify a subset of cases to work on Transforming  modify the values of a variable

  4. Organizing & Reporting Data (cont.): Where do the data values come from? Raw Data: recorded from responses, record, or observations In their (more-or-less) original form Some coding (or editing) operations usually involved Usually coded into numerical values (for ease of use) Transformed Data: modified from original values Computed values (e.g., rates, %, sums, “imputations”) Recoded values (into more correct or meaningful or useful values) Created Data: values are “made up” Simulated values Demonstration values

  5. Managing Data: Basic Tasks Transforming Data: Variable Transformations Computing new variables from prior ones Index = Q1 + Q2 + Q3 + Q4 Utility = probability * outcome Recode Variable by changing its values Change missing values (“blanks”) to “0” Recode Variable into a New Variable Age (yrs)  Child (1-11); Juvenile (12-17); Adult (18-over) Age (yrs)  10-19 yrs; 20-29 yrs; 30-39 yrs; 40-49 yrs; 50-59 yrs; 60-69 yrs; 70-79 yrs; 80-89 yrs; 90-99 yrs.

  6. Computed Data: Some Useful forms Rates – numbers divided by populations Ratios – one number divided by another Indexes– new variable = a sum (or other combination) of multiple prior variables Rescaled Data– a raw score modified by some mathematical function (e.g., logarithm) Standardized scores– Rescaled to standard units  e.g., Z-scores

  7. Recoded Data: Some Useful forms Collapsed (& abbreviated) scores Grouped scores – recoding a numeric variable into a discrete (numeric or ordinal) variable Uniform (or fixed-width) groupings  widths of groups are all the same [Note the standard rules for forming grouped variables] Non-uniform (variable or flexible) groupings  widths of groups are not all the same Normed groupings  grouped by proportions of cases  e.g., percentiles, quartiles, median-splits [a special form of non-uniform grouping]

  8. How to recode variables in SPSS? Use the Transform option on the top menu bar to change the data (see Appendix B in Kirkpatrick/Feeney for details) Compute  allows for computing a new variable from prior variables Recode  allows for modifying how a variable is coded ‘Into same variables’ (change original variable) ‘Into different variables’ (create new variable with different codes & leave original variable as is)

  9. Representing Data Distributions: In statistics, we are working with a collection of many data points  Our focus is on the distribution of the whole set of points Three forms of presentation for summarizing distributions of data points: Tabular tables and lists of numbers Graphical  pictures, shapes, and lines (in charts, graphs, and diagrams) Verbal  words and phrases

  10. Tabular Presentations: Basic Formats Data Listing: simple inventory of points in the data set Ordered Data Listing: Inventory of data sorted into groups or arranged in increasing or decreasing order Frequency Table:summary showing each value and the number of cases having that value (most relevant for discrete variables) Percentage Table: table with percentages of total cases given rather than (or in addition to) numerical counts Cumulative Percentage Table:reporting percentages of total cases which have that specific value or lower. Cross-Tab Table:a “bivariate” frequency distribution of the values of one variable across the values of another variable

  11. Cross-Tabulations (cont.) What are the parts of a cross-tab? Cells Rows and columns Marginals Grand total How to set up a cross-tab? Which variables are in the rows and columns? Use Percentages or Frequencies? How to percentage a cross-tab?

  12. Representing Distributions Graphically: Basic Formats • Pie Charts • Bar Charts • Vertical or Horizontal • Simple or Grouped • Stacked • Histograms • Line Charts • Frequency polygons • Time (Trend) plots • Relationship plots

  13. Representing Distributions Graphically: Basic Formats • Other Charts ( to be dealt with later): • Box Plots (aka “Box-and-Whiskers”) • Scatter Plots

More Related