1 / 39

Introduction to SAS Essentials Mastering SAS for Data Analytics

Learn how to use PROC FREQ in SAS to create one-way and two-way frequency tables, calculate relative risk measures, and assess inter-rater reliability using Cohen's kappa.

mantooth
Download Presentation

Introduction to SAS Essentials Mastering SAS for Data Analytics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to SAS EssentialsMastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS ESSENTIALS -- Elliott & Woodward

  2. Chapter 10: ANALYZING COUNTS AND TABLES SAS ESSENTIALS -- Elliott & Woodward

  3. LEARNING OBJECTIVES • To be able to use PROC FREQ to create one-way frequency tables • To be able to use PROC FREQ to create two-way (cross-tabulation) tables • To be able to use two-by-two contingency tables to calculate relative risk measures • To be able to use Cohen's kappa to calculate inter-rater reliability SAS ESSENTIALS -- Elliott & Woodward

  4. 10.1 USING PROC FREQ • PROC FREQ is a multipurpose SAS procedure for analyzing count data. It can be used to obtain frequency counts for one or more individual variables or to create two-way tables (cross-tabulations) from two variables. • A simplified syntax is PROC FREQ <Options(s)>; <Statements> TABLES requests </options>; SAS ESSENTIALS -- Elliott & Woodward

  5. SAS ESSENTIALS -- Elliott & Woodward

  6. SAS ESSENTIALS -- Elliott & Woodward

  7. The TABLES Statement • The TABLES statement is required for all of the examples in this chapter. Its format is: TABLES <variable-combinations/options>; • where variable-combinations specifies frequency or cross-tabulation tables. Options for the TABLE statement follow a slash (/). For example, TABLES A*B / CHISQ; • requests that the chi-square and related statistics will be reported for the cross-tabulation A*B. SAS ESSENTIALS -- Elliott & Woodward

  8. More about TABLES • To obtain counts of the number of subjects observed in each category of group (GP), use the following: PROC FREQ; TABLES GP; RUN; • To produce a cross-tabulation of GENDER by treatment GP: PROC FREQ; TABLES GENDER*GP;RUN; • The variables specified in the TABLES statement can be either categorical/character or numeric. • To request chi-square statistics for a table, include the option /CHISQ at the end of the TABLES statement.For example, PROC FREQ; TABLES GENDER*GP/CHISQ; SAS ESSENTIALS -- Elliott & Woodward

  9. SAS ESSENTIALS -- Elliott & Woodward

  10. SAS ESSENTIALS -- Elliott & Woodward

  11. 10.2 ANALYZING ONE-WAY FREQUENCY TABLES • When count data are collected, you can use PROC FREQ to produce tables of the counts by category as well as to perform statistical analyses on the counts. • This section describes how to create tables of counts by category and how to perform a goodness-of-fit test. • Do Hands On Example p 246 (AFREQ1.SAS) (Frequencies) SAS ESSENTIALS -- Elliott & Woodward

  12. ORDER= Option • The ORDER=FORMATTED option for PROC FREQ specifies the order in which the categories are displayed in the table. • You must first create a custom format in a PROC FORMAT command to define the order that you want to be used in your output table. For Example: PROC FORMAT; VALUE $FMTRACE "AA"="African American" "H"="Hispanic" "OTH"="Other " "C"="White"; RUN; SAS ESSENTIALS -- Elliott & Woodward

  13. Apply Your Created Format • To cause PROC FREQ to display categories in the Formatted order, apply your created FORMAT: PROC FREQ ORDER= FORMATTED DATA=" C: \SASDATA \SURVEY"; TABLES RACE; FORMAT RACE $FMTRACE.; RUN; • Do Hands On Exercise p 248. (AFREQ2.SAS) The ORDER= option species the order in which the categories are displayed. In this case, they are displayed in FORMATTED order. You must also apply the format to the variable for it to be correctly used. SAS ESSENTIALS -- Elliott & Woodward

  14. 10.3 CREATING ONE-WAY FREQUENCY TABLFS FROMSUMMARIZED DATA • The following example illustrates how to summarize counts from a data set into a frequency table • Suppose your data is in this summarized form: CENTS 152 CENTS 100 NICKELS 49 DIMES 59 QUARTERS 21 HALF 44 DOLLARS 21 Do the Hands On Exercise p 250 (AFREQ3.SAS) This means there are 49 nickels SAS ESSENTIALS -- Elliott & Woodward

  15. Testing Goodness of Fit in a One-Way Table • A goodness-of-fit test of a single population is a test to determine if the distribution of observed frequencies in the sample data closely matches with the expected number of occurrences under a hypothetical distribution for the population. • The hypotheses being tested are as follows: H0: The population follows the hypothesized distribution. Ha: The population does not follow the hypothesized distribution. SAS ESSENTIALS -- Elliott & Woodward

  16. Goodness-of-fit using PROC FREQ • A chi-square statistic is calculated, and a decision can be made based on the p-value associated with that statistic. A low p-value indicates that the data do not follow the hypothesized, or theoretical, distribution. If the p-value is sufficiently low (usually <0.05), you will reject the null hypothesis. The syntax to perform a goodness-of-fit test is as follows: PROC FREQ; TABLES variable/ CHISQ TESTP=(list of ratios); SAS ESSENTIALS -- Elliott & Woodward

  17. Goodness-of-fit Example • As an example, we will use data from an experiment conducted by the nineteenth-century monk Gregor Mendel. According to a genetic theory, crossbred pea plants show a 9:3:3:1 ratio. From 556 plants, you expect (9/16) x 556 = 312.75 yellow smooth peas (56.25%) (3/16) x 556 = 104.25 yellow wrinkled peas (18.75%) (3/16) x 556 = 104.25 green smooth peas (18.75%) (1/16) x 556 = 34.75 green wrinkled peas (6.25%) SAS ESSENTIALS -- Elliott & Woodward

  18. Actual Observed Data • After growing 556 of these pea plants, Mendel observed the following: 315 have yellow smooth peas 108 have yellow wrinkled peas 101 have green smooth peas 32 have green wrinkled peas SAS ESSENTIALS -- Elliott & Woodward

  19. The Goodness-of-Fit Code • Hypothesizing a 9:3:3 :1 Ratio: PROC FREQ ORDER=DATA ; WEIGHT NUMBER; TITLE 'GOODNESS OF FIT ANALYSIS'; TABLES COLORTYPE / NOCUM CHISQ TESTP=(0.5625 0.1875 0.1875 0.0625); RUN; • Do Hands on Example p 252 (AFRE4.SAS) Note these proportions are in a 9:3:3:1 ratio SAS ESSENTIALS -- Elliott & Woodward

  20. 10.4 ANALYZING TWO-WAY TABLES • To create a cross-tabulation table using PROC FREQ for relating two variables, use the TABLES statement with both variables listed and separated by an asterisk (*), (e.g., A*B). • A cross-tabulation table is formed by counting the number of occurrences in a sample across two grouping variables. • The number of columns in a table is usually denoted by c and the number of rows by r. Thus, a table is said to be an r x c table, that is, it has r x c cells. SAS ESSENTIALS -- Elliott & Woodward

  21. Test of Independence • The hypotheses associated with a test of independence are as follows: H0: The variables are independent (no association between them). Ha: The variables are not independent. For example, a null hypothesis could be that there is no association between handedness (left and right handed) to hair color SAS ESSENTIALS -- Elliott & Woodward

  22. Test of Homogeneity • The null hypothesis is that the populations have the same distribution (they are homogeneous). In this case, the hypotheses are as follows: H0: The populations are homogeneous. Ha : The populations are not homogeneous. For example, a null hypothesis could be that from two populations, (male and female) the distribution of handedness the same. SAS ESSENTIALS -- Elliott & Woodward

  23. Testing These Hypotheses • The chi-square test of independence or homogeneity is reported by PROC FREQ (the tests are mathematically equivalent) by the use of the I CHISQ option in the TABLES statement. • For example, PROC FREQ; TABLES GENDER*GP/ CHISQ; • Do Hands On Exercise p 254. (AFREQ5.SAS) Use the same code to test for either independence or homogeneity SAS ESSENTIALS -- Elliott & Woodward

  24. Example code to perform a Chi-Square Test on an rx c contingency table (Crime Example) PROC FREQ DATA=DRINKERS; WEIGHT COUNT; TABLES CRIME*DRINKER/CHISQ; TITLE 'Chi Square Analysis of a Contingency Table'; RUN; Note that WEIGHT COUNT; is needed since the data are in summary form. SAS ESSENTIALS -- Elliott & Woodward

  25. Summarized Data for the Crime Case CRIME DRINKER COUNT Arson 1 50 Arson 0 43 Rape 1 88 Rape 0 62 Violence 1 155 Violence 0 110 Stealing 1 379 Stealing 0 300 Coining 1 18 Coining 0 14 Fraud 1 63 Fraud 0 144 Notice how the data are in summarized for. For Arson, there were 50 “Drinkers” (DRINKER=1) and 43 “Non-Drinkers) (DRINKER=0) SAS ESSENTIALS -- Elliott & Woodward

  26. Results of Chi Square Analysis • Observe the statistics table. The Chi-Ssquarevalue is 49.73 and the p-value is p < 0.0001. • Thus, you reject the null hypothesis of no association (independence) and conclude that there is evidence of a relationship between drinking status and type of crime committed. SAS ESSENTIALS -- Elliott & Woodward

  27. Creating a Contingency Table from Raw Data, the 2 x 2 Case • In the previous example (CRIME) the data were in summary form, and you needed to use the WEIGHT COUNT; statement to reflect that. • If your data are in raw form – one record per observation, you do not need the WEIGHT statement. • Do Hands on Example p 257 (AFREQ6.SAS) For this data, each subject has one record – thus you have one record per observation. SAS ESSENTIALS -- Elliott & Woodward

  28. Output from 2x2 Chi-Square Analysis Statistical Results – note in particular the Chi-Square and Fisher Two-Sided Pr<=p values. The Resulting Table of counts: The chi-square statistic, 8.29, p = 0.004, indicates an association between CLEANER and RASH (rejects the null hypothesis).The two-sided Fisher results p = 0.0095 provides the same decision. SAS ESSENTIALS -- Elliott & Woodward

  29. Tables with Small Counts in Cells • When you summarize counts in tables, and there are small numbers in one or more cells, a typical chi-square statistical analysis may not be valid. • Do Hands On Example p 259 (AFREQ7.SAS) • Observe the warning message "WARNlNG: 50% of the cells have expected counts <5. Chi-square may not be a valid test.“ • In this case, the Fisher's Exact test (given in Table 10.13) is the more reliable test and should be used instead of the Chi-Square test. SAS ESSENTIALS -- Elliott & Woodward

  30. 10.5 GOING DEEPER: CALCULATING RELATIVE RISK MEASURES • Two-by-two contingency tables are often used when examining a measure of risk. • A measure of this risk in a retrospective (case-control) study is called the odds ratio (OR). In a case- control study, a researcher takes a sample of subjects and looks back in time for exposure (or nonexposure). • If the data are collected prospectively, where subjects are selected by presence or absence of a risk and then observed over time to see if they develop an outcome, the measure of risk is called relative risk (RR). • Either way RR=1 or OR=1 means no risk observed. SAS ESSENTIALS -- Elliott & Woodward

  31. Testing Relative Risk in PROC FREQ • In PROC FREQ, the option to calculate the values for OR or RR is RELRISK and appears as an option to the TABLES statement as shown here(for the RASH data): TABLES CLEANER*RASH /RELRISK; • In the results, a risk measure > 1 indicates that exposure is harmful and a risk measure <1 implies that exposure is a benefit. • Do Hands On Example p 261 (AFREQ6.SAS) SAS ESSENTIALS -- Elliott & Woodward

  32. Results of Risk Analysis The OR= 0.1346 specifies the odds of Row1/Row2 - that is, for cleaner 1 versus cleaner 2. Because OR is <1, this indicates that the odds of a person's having a rash who is using cleaner 1 is less than they are when the person is using cleaner 2. Typically, the Odds Ratio is the statistic of interest SAS ESSENTIALS -- Elliott & Woodward

  33. 10.6 GOING DEEPER: INTER-RATER RELIABILITY (KAPPA) • A method for assessing the degree of agreement between two raters is Cohen's kappa coefficient. • For example, kappa is useful for analyzing the consistency of two raters who evaluate subjects on the basis of a categorical measurement. Two Raters A and B compared… SAS ESSENTIALS -- Elliott & Woodward

  34. Code used to Calculate Kappa PROC FREQ WEIGHT WT; TABLE RATERl*RATER2 / AGREE ; TEST KAPPA; TITLE 'KAPPA EXAMPLE FROM FLEISS'; RUN; • Do Hands on Exercise p 262 (AKAPPA1.SAS) These options provide the Kappa test results. SAS ESSENTIALS -- Elliott & Woodward

  35. Results of Kappa Analysis This is Kappa – the primary statistic of interest How to interpret kappa. See text for additional explanation of these results. SAS ESSENTIALS -- Elliott & Woodward

  36. Calculating Weighted Kappa • For the case in which rated categories are ordinal (that is the categories of interest are in a meaningful order), it is appropriate to use the weighted kappa statistic, because it is designed to give partial credit to ratings that are close to but not on the diagonal. • For example, in a test of recognition of potentially dangerous airline passengers, suppose a procedure is devised that classifies passengers into three categories: 1 =No threat/Pass 2 =Concern/Recheck 3 =Potential threat/Detain. • Do Hands on Exercise p 265 (AKAPPA2.SAS) Note how these 3 categories are ascending in danger – they have a definite order (they are ordinal.) SAS ESSENTIALS -- Elliott & Woodward

  37. Weighted Kappa Results • The code uses this code to caclulate the weighted kappa: TABLE RATER1*RATER2 /AGREE; TEST WTKAP; • For this analysis report the “Weighted Kappa” value kappa=0.7413. • Use the same interpretation of kappa table as before. • See text for the interpretation of other results. SAS ESSENTIALS -- Elliott & Woodward

  38. 10.7 SUMMARY • This chapter discusses the capabilities of PROC FREQ for creating one- and two-way frequency tables, analyzing contingency tables, calculating measures of risk, and measuring inter-rater reliability (using KAPPA). • Continue to Chapter 11: COMPARING MEANS USING T-TESTS SAS ESSENTIALS -- Elliott & Woodward

  39. These slides are based on the book: Introduction to SAS EssentialsMastering SAS for Data Analytics, 2nd Edition By Alan C, Elliott and Wayne A. Woodward Paperback: 512 pages Publisher: Wiley; 2 edition (August 3, 2015) Language: English ISBN-10: 111904216X ISBN-13:978-1119042167 These slides are provided for you to use to teach SAS using this book. Feel free to modify them for your own needs. Please send comments about errors in the slides (or suggestions for improvements) to acelliott@smu.edu. Thanks. SAS ESSENTIALS -- Elliott & Woodward

More Related