1 / 49

Analysis of Data

Analysis of Data. Conducting Analyses. The first thing you should do is make sure your variables are coded correctly. Remember levels of measurement: They determine statistical analyses you choose, and they are affected by your coding decisions!

vevay
Download Presentation

Analysis of Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Analysis of Data

  2. Conducting Analyses • The first thing you should do is make sure your variables are coded correctly. • Remember levels of measurement: They determine statistical analyses you choose, and they are affected by your coding decisions! • Dichotomous dependent variables should have 0/1 coding, where “1” represents the presence of something and “0” the absence. • Nominal variables should have 3 – 5 categories only (to aid interpretation), so you may need to combine or collapse categories. • Interval-ratio variables must have a coding scheme where increasing code values reflect increasing values of your measured concepts. • Interval-ratio variables should not as a matter of routine be dichotomized or collapsed to 3 – 5 categories unless necessary for particular analyses.

  3. Conducting Analyses • To Recode in SPSS, we’ll create a new variable, thus reserving the original data in case we screw up. Recoding as such, you MUST account for all previous codes when creating new variable codes. • SPSS Commands: • <Transform>  <Recode>  <Into Different Variables…>  • [Highlight Variable on Left and click on arrow to put it into “Input Variable -> Output Variable:” box]  • [Write in a new variable name and click <change>]  • <Old and New Values…>  [code by code, enter the old code on the left, and the corresponding new code on the right, click <Add>, until all old codes are accounted for]  • <Continue>  <OK> • Your new variable name will appear as the last row at the end of your variables list and the data will appear as the last column at the end of the data columns.

  4. Conducting Analyses • Example of Recoding—Dichotomous • Data file has favor or oppose capital punishment, cappun, coded as follows: • 0 = NAP • 1 = Favor • 2 = Oppose • 8 = DK • 9 = NA • Favor should equal “1” BUT Oppose should equal “0” • Of course, NAP, DK, and NA are useless and should be treated as missing. • To Recode, SPSS Commands: • <Transform>  <Recode>  <Into Different Variables…>  • [Highlight Cappun and click on arrow to put it into “Input Variable -> Output Variable:” box]  • [Write in a new variable name, “newcappun” and click <change>]  • <Old and New Values…>  • [0 left, Missing right, click <Add>; 8 left, Missing right, click <Add>; 9 left, Missing right, click <Add>; 1 left, 1 right, click <Add>; 2 left, 0 right, click <Add>]  • <Continue>  <OK> • Newcappun coding • 0 = Oppose • 1 = Favor

  5. Conducting Analyses • Example of Recoding—Nominal • Data file has religion, relig, coded as follows: • 0 = NAP 6 = Buddhism • 1 = Protestant 7 = Hinduism • 2 = Catholic 8 = Other Eastern • 3 = Jewish 9 = Moslem/Islam • 4 = None 10 = Orthodox-christian • 5 = Other (specify) 11 = Christian • 98 = DK 12 = Native American • 99 = NA 13 = Inter-Nondenominational • You might just want Christian, Other, and None • Of course, NAP, DK, and NA are useless and should be treated as missing. • To Recode, SPSS Commands: • <Transform>  <Recode>  <Into Different Variables…>  • [Highlight relig and click on arrow to put it into “Input Variable -> Output Variable:” box]  • [Write in a new variable name, “newrelig” and click <change>]  • <Old and New Values…>  • [0 left, Missing right, click <Add>; 98 left, Missing right, click <Add>; 99 left, Missing right, click <Add>; 1 through 2 left, 1 right, click <Add>; 10 through 11 left, 1 right, click <Add>; 3 left, 2 right, click <Add>; 5 through 9 left, 2 right, click <Add>; 12 through 13 left, 2 right, click <Add>; 4 left, 3 right, click <Add>]  • <Continue>  <OK> • Newrelig coding • 1 = All Christian groups • 2 = All other religious groups • 3 = No religion

  6. Conducting Analyses • Example of Recoding—Interval-Ratio • Data file has political party affiliation, partyid, coded as follows: • 0 = Strong Democrat • 1 = Not Very Strong Democrat • 2 = Independent, Close to Democrat • 3 = Independent (Neither, No Response) • 4 = Independent, Close to Republican • 5 = Not Very Strong Republican • 6 = Strong Republican • 7 = Other Party, Refused to Say • 8 = DK • 9 = NA • You could make this a scale of party identification, but Other party wouldn’t be a good pole item. I would treat it as missing. • Of course, DK, and NA are useless and should be treated as missing. • To Recode, SPSS Commands: • <Transform>  <Recode>  <Into Different Variables…>  • [Highlight partyid and click on arrow to put it into “Input Variable -> Output Variable:” box]  • [Write in a new variable name, “newpartyid” and click <change>]  • <Old and New Values…>  • [7 through 9 left, Missing right, click <Add>; 0 through 7 left, “Copy” right, click <Add>]  • <Continue>  <OK> • Newparyid coding • 0 = Strong Democrat • 1 = Not Very Strong Democrat • 2 = Independent, Close to Democrat • 3 = Independent (Neither, No Response) • 4 = Independent, Close to Republican • 5 = Not Very Strong Republican • 6 = Strong Republican

  7. Conducting Analyses • The second thing you should do is conduct descriptive statistics for your variables. • Means and standard deviations for interval and ratio (and I-R Type ordinal) variables • Proportions or percents for those people answering “one” in dichotomous variables • Proportions or percents in each category of nominal (and Nominal Type ordinal) variables

  8. Conducting Analyses • The way you use SPSS to get descriptive statistics is: • For nominal variables create frequencies • For interval or ratio variables, request descriptive statistics • Enter SPSS and open data file (make sure variables are coded properly) • For nominal variables, use commands (click on menu items): • Analyze • Descriptive Statistics • Frequencies • Place your variables in the “Variable(s):” box by highlighting them and then clicking on the right-pointing arrow. • Click “OK” • For interval or ratio variables, do the steps above, but click on these other items before clicking “OK.” • Statistics… • Check off: • Mean • Standard deviation • Click “Continue” • Click “OK”

  9. Conducting Analyses • SPSS results will make you feel something like this: http://gawker.com/5726703/this-baby+swinging-yoga-video-cant-be-real-right But they will look like this…

  10. Conducting Analyses Note how ugly and confusing this table is… You’ll want to help your reader out by making professional-looking tables!

  11. Conducting Analyses Elements of a good table: Title, Headers, aligned columns, order, lack of clutter.

  12. Conducting Analyses • Sadly:  • You may put SPSS output into your results section to substitute for professional tables. • Remember, however, that a professional table is much easier to understand than SPSS output. • You would NEVER use SPSS output in a professional presentation of results! • Those with professional tables will get extra credit.

  13. Conducting Analyses • The next thing you want to do is test your hypotheses. • Let’s recall what the point of statistical analysis is… • Descriptive statistics tell us about the persons from whom we collected data—a sample or a census • Inferential statistics allow us to infer something about a population using a sample’s characteristics

  14. Conducting Analyses Sampling Sample GSS Population Persons in US Households Inferences

  15. Conducting Analyses Sampling error is also built into the process of Random Sampling. Researchers typically get one sample, but there are many, many possible samples.

  16. Conducting Analyses • Variation from sample to sample would result in variations in statistics. For example, average IQ would be different for each sample: 110 104 98 107 101

  17. Conducting Analyses Let’s create a sampling distribution of means… Take a sample of size 1,500 from the US. Record the mean income. Our census said the mean is $30K. $30K = a possible sample’s mean

  18. Conducting Analyses Let’s create a sampling distribution of means… Take another sample of size 1,500 from the US. Record the mean income. Our census said the mean is $30K. $30K = a possible sample’s mean

  19. Conducting Analyses Let’s create a sampling distribution of means… Take another sample of size 1,500 from the US. Record the mean income. Our census said the mean is $30K. $30K = a possible sample’s mean

  20. Conducting Analyses Let’s create a sampling distribution of means… Take another sample of size 1,500 from the US. Record the mean income. Our census said the mean is $30K. $30K = a possible sample’s mean

  21. Conducting Analyses Let’s create a sampling distribution of means… Let’s repeat sampling of sizes 1,500 from the US. Record the mean incomes. Our census said the mean is $30K. $30K = a possible sample’s mean

  22. Conducting Analyses Let’s create a sampling distribution of means… Let’s repeat sampling of sizes 1,500 from the US. Record the mean incomes. Our census said the mean is $30K. $30K = a possible sample’s mean

  23. Conducting Analyses Let’s create a sampling distribution of means… Let’s repeat sampling of sizes 1,500 from the US. Record the mean incomes. Our census said the mean is $30K. The sample means would stack up in a normal curve. A normal sampling distribution. $30K = a possible sample’s mean

  24. Conducting Analyses We take the “stacking” knowledge and can guess values of a statistic… If our guess is correct, our sample’s statistic should be among the common samples that would have been drawn from a population with that number. If it is not, it is likely that the sample did not come from such a population. What if my sample’s statistic were here?  guess

  25. Conducting Analyses Knowing how sample statistics would stack up allows us to obtain a metric for how close a sample’s information might be to the population’s (accuracy of sample). This permits what we call inferential statistics. Inferential statistics allow the researcher to come to conclusions about a population on the basis of descriptive statistics about a sample.

  26. Conducting Analyses The statistics you see in quantitative research that employ significance tests are premised on inferential statistics. These statistics are typically reported as responses to tests of null hypotheses—you rarely see the nulls reported in articles. A null hypothesis typically asks if a statistic such as a measure of the relationship between two variables is zero in the population. The significance test results typically report the percentage likelihood that the statistic in the sample could have come from a population where the statistic is zero. Researchers typically reject the null hypothesis when there is less than a 5% chance that the sample statistic could have come from a population where the null is true.

  27. Conducting Analyses We take the “stacking” knowledge and can guess values of a statistic… What if my sample’s statistic were here? 0 Guess value for statistic about relationship: 0, no relationship.

  28. Conducting Analyses Significance tests ask whether the sample is likely to have come from a population with a particular relationship between variables. Spread of 95% of Possible Sample Statistics 0 Your sample’s statistic here—could have come from a population with no relationship! Has high percent chance of that. No Relationship, null Likely variation from sample to sample

  29. Conducting Analyses • Inferential statistics typically test a null hypothesis that there is no relationship between variables in the population. • We reject the null if our statistics show evidence that our sample’s data are unlikely to have come from a population where the null (no relationship) is true. This allows us to believe the variables are related in the population. • Each statistical test will indicate the likelihood or probability • that some null is true for the population • . . . Another way to think about this: each statistical test tells us the likelihood or probability that our sample’s results are a product of chance variation that would occur from sample to sample using samples the same size as ours.

  30. Conducting Analyses • Since numbers will vary naturally from sample to sample, we use inferential statistics to tell us how much our statistics describing the relationships between variables (association) should “jump around by chance.” • We use a term, “significance,” to refer to whether our statistics could represent real evidence about the population or whether they reflect natural “jumping around.”

  31. Conducting Analyses You will use an inferential statistics technique to test the NULL version of your four hypotheses: • Inferential statistics null: that there is no relationship between variables in the population • The technique will 1) give a statistic that describes the relationship between variables in the sample, and 2) information on whether the null is REJECTED or not. • The p-value (probability, or in SPSS, “Sig.”) for the statistics, to find out the likelihood that the relationship statistics could have been produced (or not) by a population where the null is true. • If your p-value is less than .05, you can reject the null because you have little chance that your sample came from a population where there is no relationship. • If your p-value is greater than .05 (such as .26), you fail to reject the null. There is a high chance that your sample came from a population where the null is true, where there is no relationship between variables.

  32. Conducting Analyses The statistical test you choose depends on your variables’ levels of measurement: Dependent Variable Independent Variable Statistical Test Interval-Ratio Dichotomous Independent Samples t-test Dichotomous Interval-Ratio Nominal ANOVA Dichotomous Interval-Ratio Interval-Ratio Correlation Dichotomous Nominal Nominal* Cross Tabs Dichotomous *An interval-ratio variable may be used as an independent variable if it is collapsed into about five or fewer categories, essentially making it nominal.

  33. Conducting Analyses The statistical test you choose depends on your variables’ levels of measurement: Dependent Variable Independent Variable Statistical Test Education Sex Independent Samples t-test Marijuana Legal, YN Income Region of Country ANOVA Police May Hit, YN Agree with Abortion Age Correlation College Degree, YN Marital Status Race, Age Groups* Cross Tabs Sex *An interval-ratio variable may be used as an independent variable if it is collapsed into about five or fewer categories, essentially making it nominal.

  34. Conducting Analyses • Testing hypotheses… Independent Samples t-Test • You must select the test that is appropriate for your variables. • If your independent variable is dichotomous and your dependent variable is interval, ratio or dichotomous, you should use an independent samples t-test. • A t-test will tell you whether the means on the dependent variable are different for each group in the population. Null: Group 1’s Mean = Group 2’s Mean • Example: Sex Education

  35. Conducting Analyses Steps in SPSS to conduct an independent samples t-test: • Enter SPSS and open data file (make sure variables are coded properly) • Use commands (click on menu items): • Analyze • Compare Means • Independent Samples T-Test • Highlight dependent variable and click arrow to place into “Test Variable(s):” box. • Highlight independent variable and click arrow to place into “Grouping Variable:” box. • Click on “Define Groups…” • Enter the code number for each of the two groups (e.g., “0” for men in the box next to “Group 1” and “1” for women in the box next to “Group 2”). Click “Continue.” • Click “OK”

  36. Conducting Analyses • SPSS results will make you look something like this: But they will look like this…

  37. Conducting Analyses Mean education for men and for women. Probability that this sample could have come from a population where men and women have equal education. P > .05, that’s high! I’d say on average men and women have same education in the US.

  38. Conducting Analyses • Testing hypotheses… ANOVA • You must select the test that is appropriate for your variables. • If your independent variable is nominal or dichotomous and your dependent variable is interval, ratio, or dichotomous, you should use ANOVA. • An ANOVA test will tell you whether the mean levels on the dependent variable are different for each group of the independent variable in the population. Null: Each group’s mean is equal to the others’. • Example: Region of the Country Income

  39. Conducting Analyses Steps in SPSS to conduct ANOVA: • Enter SPSS and open data file (make sure variables are coded properly) • Use commands (click on menu items): • Analyze • Compare Means • One-Way ANOVA… • Highlight dependent variable and click arrow to place into “Dependent List:” box. • Highlight independent variable and click arrow to place into “Factor:” box. • Click on “Options…” at the bottom • Check the “Descriptive” box • Click “Continue.” • Click “OK”

  40. Conducting Analyses • SPSS results will make you look something like this: But they will look like this…

  41. Conducting Analyses Mean Income Scale score for each region of the country Probability that this sample could have come from a population where each region has the same Income Scale mean. P<.05, that’s low! I’d say that some regions in the US have lower average income than others.

  42. Conducting Analyses • Testing hypotheses… Correlation • You must select the test that is appropriate for your variables. • If your independent variable is interval, ratio or dichotomous and your dependent variable is interval or ratio, you should use correlations or multiple regression. • A correlation test will tell you whether the mean levels on the dependent variable are linearly related to the mean levels on independent variable in the population. It will also tell you the strength and direction of any linear relationship between the two variables (positive or negative). Null: The correlation between the two variables is zero. • Example: Age Agreement with Abortion

  43. Conducting Analyses Steps in SPSS to conduct a correlation test: • Enter SPSS and open data file (make sure variables are coded properly) • Use commands (click on menu items): • Analyze • Correlate • Bivariate… • Highlight dependent variable and click arrow to place into “Variables:” box. • Highlight independent variable and click arrow to place into “Variables:” box. • Click “OK”

  44. Conducting Analyses • SPSS results will make you look something like this: But they will look like this…

  45. Conducting Analyses Description of strength of linear relationship between Age and Agreement with Abortion… In the sample, as age goes up 1 standard deviation, abortion scale increases .065 standard deviations. Probability that this sample could have come from a population where the linear relationship between age and abortion attitude is equal to zero. P>.05, just barely but this is higher than I’d want to reject a null. I’d say that age is not linearly related to abortion attitude in the US.

  46. Conducting Analyses • Testing hypotheses… Chi-Squared Test • You must select the test that is appropriate for your variables. • If your independent variable is nominal or dichotomous and your dependent variable is nominal or dichotomous, you should use a chi-squared test. • A chi-squared test will tell you whether the categorical assignment on the dependent variable is related to the categorical assignment on the independent variable in the population. Null: The two variables are independent of each other. • Example: Race Marital Status

  47. Conducting Analyses Steps in SPSS to conduct a chi-squared test: • Enter SPSS and open data file (make sure variables are coded properly) • Use commands (click on menu items): • Analyze • Descriptive Statistics • Crosstabs… • Highlight dependent variable and click arrow to place into “Column(s):” box. • Highlight independent variable and click arrow to place into “Row(s):” box. • Click on “Statistics…” at the bottom • Check the “Chi-square” box • Click “Continue.” • Click on “Cells…” at the bottom • Check the “Percentages-Row” box • Click “Continue.” • Click “OK”

  48. Conducting Analyses • SPSS results will make you look something like this: But they will look like this…

  49. Conducting Analyses Counts and percentages of each racial group in each marital category. Probability that this sample could have come from a population where all racial groups have the same marital patterns. P < .05, that’s low! I’d say that different races have different marital patterns in the US.

More Related