Analysis of Data

Analysis of Data

Conducting Analyses • The first thing you should do is make sure your variables are coded correctly. • Remember levels of measurement: They determine statistical analyses you choose, and they are affected by your coding decisions! • Dichotomous dependent variables should have 0/1 coding, where “1” represents the presence of something and “0” the absence. • Nominal variables should have 3 – 5 categories only (to aid interpretation), so you may need to combine or collapse categories. • Interval-ratio variables must have a coding scheme where increasing code values reflect increasing values of your measured concepts. • Interval-ratio variables should not as a matter of routine be dichotomized or collapsed to 3 – 5 categories unless necessary for particular analyses.

Conducting Analyses • To Recode in SPSS, we’ll create a new variable, thus reserving the original data in case we screw up. Recoding as such, you MUST account for all previous codes when creating new variable codes. • SPSS Commands: • <Transform>  <Recode>  <Into Different Variables…>  • [Highlight Variable on Left and click on arrow to put it into “Input Variable -> Output Variable:” box]  • [Write in a new variable name and click <change>]  • <Old and New Values…>  [code by code, enter the old code on the left, and the corresponding new code on the right, click <Add>, until all old codes are accounted for]  • <Continue>  <OK> • Your new variable name will appear as the last row at the end of your variables list and the data will appear as the last column at the end of the data columns.

Conducting Analyses • Example of Recoding—Dichotomous • Data file has favor or oppose capital punishment, cappun, coded as follows: • 0 = NAP • 1 = Favor • 2 = Oppose • 8 = DK • 9 = NA • Favor should equal “1” BUT Oppose should equal “0” • Of course, NAP, DK, and NA are useless and should be treated as missing. • To Recode, SPSS Commands: • <Transform>  <Recode>  <Into Different Variables…>  • [Highlight Cappun and click on arrow to put it into “Input Variable -> Output Variable:” box]  • [Write in a new variable name, “newcappun” and click <change>]  • <Old and New Values…>  • [0 left, Missing right, click <Add>; 8 left, Missing right, click <Add>; 9 left, Missing right, click <Add>; 1 left, 1 right, click <Add>; 2 left, 0 right, click <Add>]  • <Continue>  <OK> • Newcappun coding • 0 = Oppose • 1 = Favor

Conducting Analyses • Example of Recoding—Nominal • Data file has religion, relig, coded as follows: • 0 = NAP 6 = Buddhism • 1 = Protestant 7 = Hinduism • 2 = Catholic 8 = Other Eastern • 3 = Jewish 9 = Moslem/Islam • 4 = None 10 = Orthodox-christian • 5 = Other (specify) 11 = Christian • 98 = DK 12 = Native American • 99 = NA 13 = Inter-Nondenominational • You might just want Christian, Other, and None • Of course, NAP, DK, and NA are useless and should be treated as missing. • To Recode, SPSS Commands: • <Transform>  <Recode>  <Into Different Variables…>  • [Highlight relig and click on arrow to put it into “Input Variable -> Output Variable:” box]  • [Write in a new variable name, “newrelig” and click <change>]  • <Old and New Values…>  • [0 left, Missing right, click <Add>; 98 left, Missing right, click <Add>; 99 left, Missing right, click <Add>; 1 through 2 left, 1 right, click <Add>; 10 through 11 left, 1 right, click <Add>; 3 left, 2 right, click <Add>; 5 through 9 left, 2 right, click <Add>; 12 through 13 left, 2 right, click <Add>; 4 left, 3 right, click <Add>]  • <Continue>  <OK> • Newrelig coding • 1 = All Christian groups • 2 = All other religious groups • 3 = No religion

Conducting Analyses • Example of Recoding—Interval-Ratio • Data file has political party affiliation, partyid, coded as follows: • 0 = Strong Democrat • 1 = Not Very Strong Democrat • 2 = Independent, Close to Democrat • 3 = Independent (Neither, No Response) • 4 = Independent, Close to Republican • 5 = Not Very Strong Republican • 6 = Strong Republican • 7 = Other Party, Refused to Say • 8 = DK • 9 = NA • You could make this a scale of party identification, but Other party wouldn’t be a good pole item. I would treat it as missing. • Of course, DK, and NA are useless and should be treated as missing. • To Recode, SPSS Commands: • <Transform>  <Recode>  <Into Different Variables…>  • [Highlight partyid and click on arrow to put it into “Input Variable -> Output Variable:” box]  • [Write in a new variable name, “newpartyid” and click <change>]  • <Old and New Values…>  • [7 through 9 left, Missing right, click <Add>; 0 through 7 left, “Copy” right, click <Add>]  • <Continue>  <OK> • Newparyid coding • 0 = Strong Democrat • 1 = Not Very Strong Democrat • 2 = Independent, Close to Democrat • 3 = Independent (Neither, No Response) • 4 = Independent, Close to Republican • 5 = Not Very Strong Republican • 6 = Strong Republican

Conducting Analyses • The second thing you should do is conduct descriptive statistics for your variables. • Means and standard deviations for interval and ratio (and I-R Type ordinal) variables • Proportions or percents for those people answering “one” in dichotomous variables • Proportions or percents in each category of nominal (and Nominal Type ordinal) variables

Conducting Analyses • The way you use SPSS to get descriptive statistics is: • For nominal variables create frequencies • For interval or ratio variables, request descriptive statistics • Enter SPSS and open data file (make sure variables are coded properly) • For nominal variables, use commands (click on menu items): • Analyze • Descriptive Statistics • Frequencies • Place your variables in the “Variable(s):” box by highlighting them and then clicking on the right-pointing arrow. • Click “OK” • For interval or ratio variables, do the steps above, but click on these other items before clicking “OK.” • Statistics… • Check off: • Mean • Standard deviation • Click “Continue” • Click “OK”

Conducting Analyses • SPSS results will make you feel something like this: http://gawker.com/5726703/this-baby+swinging-yoga-video-cant-be-real-right But they will look like this…

Conducting Analyses Note how ugly and confusing this table is… You’ll want to help your reader out by making professional-looking tables!

Conducting Analyses Elements of a good table: Title, Headers, aligned columns, order, lack of clutter.

Conducting Analyses • Sadly:  • You may put SPSS output into your results section to substitute for professional tables. • Remember, however, that a professional table is much easier to understand than SPSS output. • You would NEVER use SPSS output in a professional presentation of results! • Those with professional tables will get extra credit.

Conducting Analyses • The next thing you want to do is test your hypotheses. • Let’s recall what the point of statistical analysis is… • Descriptive statistics tell us about the persons from whom we collected data—a sample or a census • Inferential statistics allow us to infer something about a population using a sample’s characteristics

Conducting Analyses Sampling Sample GSS Population Persons in US Households Inferences

Conducting Analyses Sampling error is also built into the process of Random Sampling. Researchers typically get one sample, but there are many, many possible samples.

Conducting Analyses • Variation from sample to sample would result in variations in statistics. For example, average IQ would be different for each sample: 110 104 98 107 101

Conducting Analyses Let’s create a sampling distribution of means… Take a sample of size 1,500 from the US. Record the mean income. Our census said the mean is $30K. $30K = a possible sample’s mean

Conducting Analyses Let’s create a sampling distribution of means… Take another sample of size 1,500 from the US. Record the mean income. Our census said the mean is $30K. $30K = a possible sample’s mean

Conducting Analyses Let’s create a sampling distribution of means… Let’s repeat sampling of sizes 1,500 from the US. Record the mean incomes. Our census said the mean is $30K. $30K = a possible sample’s mean

Conducting Analyses Let’s create a sampling distribution of means… Let’s repeat sampling of sizes 1,500 from the US. Record the mean incomes. Our census said the mean is $30K. The sample means would stack up in a normal curve. A normal sampling distribution. $30K = a possible sample’s mean

Conducting Analyses We take the “stacking” knowledge and can guess values of a statistic… If our guess is correct, our sample’s statistic should be among the common samples that would have been drawn from a population with that number. If it is not, it is likely that the sample did not come from such a population. What if my sample’s statistic were here?  guess

Conducting Analyses Knowing how sample statistics would stack up allows us to obtain a metric for how close a sample’s information might be to the population’s (accuracy of sample). This permits what we call inferential statistics. Inferential statistics allow the researcher to come to conclusions about a population on the basis of descriptive statistics about a sample.

Conducting Analyses The statistics you see in quantitative research that employ significance tests are premised on inferential statistics. These statistics are typically reported as responses to tests of null hypotheses—you rarely see the nulls reported in articles. A null hypothesis typically asks if a statistic such as a measure of the relationship between two variables is zero in the population. The significance test results typically report the percentage likelihood that the statistic in the sample could have come from a population where the statistic is zero. Researchers typically reject the null hypothesis when there is less than a 5% chance that the sample statistic could have come from a population where the null is true.

Conducting Analyses We take the “stacking” knowledge and can guess values of a statistic… What if my sample’s statistic were here? 0 Guess value for statistic about relationship: 0, no relationship.

Conducting Analyses Significance tests ask whether the sample is likely to have come from a population with a particular relationship between variables. Spread of 95% of Possible Sample Statistics 0 Your sample’s statistic here—could have come from a population with no relationship! Has high percent chance of that. No Relationship, null Likely variation from sample to sample

Conducting Analyses • Inferential statistics typically test a null hypothesis that there is no relationship between variables in the population. • We reject the null if our statistics show evidence that our sample’s data are unlikely to have come from a population where the null (no relationship) is true. This allows us to believe the variables are related in the population. • Each statistical test will indicate the likelihood or probability • that some null is true for the population • . . . Another way to think about this: each statistical test tells us the likelihood or probability that our sample’s results are a product of chance variation that would occur from sample to sample using samples the same size as ours.

Conducting Analyses • Since numbers will vary naturally from sample to sample, we use inferential statistics to tell us how much our statistics describing the relationships between variables (association) should “jump around by chance.” • We use a term, “significance,” to refer to whether our statistics could represent real evidence about the population or whether they reflect natural “jumping around.”

Conducting Analyses You will use an inferential statistics technique to test the NULL version of your four hypotheses: • Inferential statistics null: that there is no relationship between variables in the population • The technique will 1) give a statistic that describes the relationship between variables in the sample, and 2) information on whether the null is REJECTED or not. • The p-value (probability, or in SPSS, “Sig.”) for the statistics, to find out the likelihood that the relationship statistics could have been produced (or not) by a population where the null is true. • If your p-value is less than .05, you can reject the null because you have little chance that your sample came from a population where there is no relationship. • If your p-value is greater than .05 (such as .26), you fail to reject the null. There is a high chance that your sample came from a population where the null is true, where there is no relationship between variables.

Conducting Analyses The statistical test you choose depends on your variables’ levels of measurement: Dependent Variable Independent Variable Statistical Test Interval-Ratio Dichotomous Independent Samples t-test Dichotomous Interval-Ratio Nominal ANOVA Dichotomous Interval-Ratio Interval-Ratio Correlation Dichotomous Nominal Nominal* Cross Tabs Dichotomous *An interval-ratio variable may be used as an independent variable if it is collapsed into about five or fewer categories, essentially making it nominal.

Conducting Analyses The statistical test you choose depends on your variables’ levels of measurement: Dependent Variable Independent Variable Statistical Test Education Sex Independent Samples t-test Marijuana Legal, YN Income Region of Country ANOVA Police May Hit, YN Agree with Abortion Age Correlation College Degree, YN Marital Status Race, Age Groups* Cross Tabs Sex *An interval-ratio variable may be used as an independent variable if it is collapsed into about five or fewer categories, essentially making it nominal.

Conducting Analyses • Testing hypotheses… Independent Samples t-Test • You must select the test that is appropriate for your variables. • If your independent variable is dichotomous and your dependent variable is interval, ratio or dichotomous, you should use an independent samples t-test. • A t-test will tell you whether the means on the dependent variable are different for each group in the population. Null: Group 1’s Mean = Group 2’s Mean • Example: Sex Education

Conducting Analyses Steps in SPSS to conduct an independent samples t-test: • Enter SPSS and open data file (make sure variables are coded properly) • Use commands (click on menu items): • Analyze • Compare Means • Independent Samples T-Test • Highlight dependent variable and click arrow to place into “Test Variable(s):” box. • Highlight independent variable and click arrow to place into “Grouping Variable:” box. • Click on “Define Groups…” • Enter the code number for each of the two groups (e.g., “0” for men in the box next to “Group 1” and “1” for women in the box next to “Group 2”). Click “Continue.” • Click “OK”

Conducting Analyses • SPSS results will make you look something like this: But they will look like this…

Conducting Analyses Mean education for men and for women. Probability that this sample could have come from a population where men and women have equal education. P > .05, that’s high! I’d say on average men and women have same education in the US.

Conducting Analyses • Testing hypotheses… ANOVA • You must select the test that is appropriate for your variables. • If your independent variable is nominal or dichotomous and your dependent variable is interval, ratio, or dichotomous, you should use ANOVA. • An ANOVA test will tell you whether the mean levels on the dependent variable are different for each group of the independent variable in the population. Null: Each group’s mean is equal to the others’. • Example: Region of the Country Income

Conducting Analyses Steps in SPSS to conduct ANOVA: • Enter SPSS and open data file (make sure variables are coded properly) • Use commands (click on menu items): • Analyze • Compare Means • One-Way ANOVA… • Highlight dependent variable and click arrow to place into “Dependent List:” box. • Highlight independent variable and click arrow to place into “Factor:” box. • Click on “Options…” at the bottom • Check the “Descriptive” box • Click “Continue.” • Click “OK”

Conducting Analyses Mean Income Scale score for each region of the country Probability that this sample could have come from a population where each region has the same Income Scale mean. P<.05, that’s low! I’d say that some regions in the US have lower average income than others.

Conducting Analyses • Testing hypotheses… Correlation • You must select the test that is appropriate for your variables. • If your independent variable is interval, ratio or dichotomous and your dependent variable is interval or ratio, you should use correlations or multiple regression. • A correlation test will tell you whether the mean levels on the dependent variable are linearly related to the mean levels on independent variable in the population. It will also tell you the strength and direction of any linear relationship between the two variables (positive or negative). Null: The correlation between the two variables is zero. • Example: Age Agreement with Abortion

Conducting Analyses Steps in SPSS to conduct a correlation test: • Enter SPSS and open data file (make sure variables are coded properly) • Use commands (click on menu items): • Analyze • Correlate • Bivariate… • Highlight dependent variable and click arrow to place into “Variables:” box. • Highlight independent variable and click arrow to place into “Variables:” box. • Click “OK”

Conducting Analyses Description of strength of linear relationship between Age and Agreement with Abortion… In the sample, as age goes up 1 standard deviation, abortion scale increases .065 standard deviations. Probability that this sample could have come from a population where the linear relationship between age and abortion attitude is equal to zero. P>.05, just barely but this is higher than I’d want to reject a null. I’d say that age is not linearly related to abortion attitude in the US.

Conducting Analyses • Testing hypotheses… Chi-Squared Test • You must select the test that is appropriate for your variables. • If your independent variable is nominal or dichotomous and your dependent variable is nominal or dichotomous, you should use a chi-squared test. • A chi-squared test will tell you whether the categorical assignment on the dependent variable is related to the categorical assignment on the independent variable in the population. Null: The two variables are independent of each other. • Example: Race Marital Status

Conducting Analyses Steps in SPSS to conduct a chi-squared test: • Enter SPSS and open data file (make sure variables are coded properly) • Use commands (click on menu items): • Analyze • Descriptive Statistics • Crosstabs… • Highlight dependent variable and click arrow to place into “Column(s):” box. • Highlight independent variable and click arrow to place into “Row(s):” box. • Click on “Statistics…” at the bottom • Check the “Chi-square” box • Click “Continue.” • Click on “Cells…” at the bottom • Check the “Percentages-Row” box • Click “Continue.” • Click “OK”

Conducting Analyses Counts and percentages of each racial group in each marital category. Probability that this sample could have come from a population where all racial groups have the same marital patterns. P < .05, that’s low! I’d say that different races have different marital patterns in the US.

Analysis of Data

Analysis of Data

Presentation Transcript

Techniques of Data Analysis

Analysis of Survival Data

Analysis of microarray data

Analysis of JC Data

Analysis of data

Analysis of aCORN Data

Analysis of Metrology Data

Analysis of Categorical Data

Data Analysis of Calls

Analysis of Quantitative Data

Statistical Analysis of Data

Analysis of TOF data

Analysis of IMS Data

Graphical Analysis of Data

Analysis of count data

Analysis Of Grouped Data

Analysis of DHCAL Data

Analysis of Pupillary Data

Analysis of count data

Analysis of Data Qualitative Data Analysis

Tools of data analysis