1 / 26

Elang 273: Statistics

Elang 273: Statistics. September 15, 2008. Statistics. The scientific method is defined by: 1. The research question is empirical 2. The data we collect is public 3. The data is falsifiable.  But also with this.  Statistics helps most with this. Statistics.

Download Presentation

Elang 273: Statistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Elang 273: Statistics September 15, 2008

  2. Statistics The scientific method is defined by: 1. The research question is empirical 2. The data we collect is public 3. The data is falsifiable  But also with this  Statistics helps most with this

  3. Statistics Research question: Is the word glistening used more often in one register (as shown in COCA) than another? How much different do these frequencies have to be before we can say they are different?

  4. Statistics Researchers have agreed that if the chance that the difference between two groups is greater than a certain percentage, then we will consider the difference to be statistically significant. A significant difference is better than one in twenty of happening by chance (p < .05). The opposite of significance is random chance.

  5. Two types of statistics 1. Descriptive a. nominal (categorical) b. ordinal (rank order) c. continuous 2. Inferential a. chi-square b. t-tests/ANOVA c. correlations d. varbrul

  6. 1. Descriptive Statistics These are the types of statistics you are familiar with—showing means, percentages, quartiles, usually through bars, pie charts, and graphs

  7. 1. Descriptive Statistics Three types of data • Nominal (Categorical): sex, race, national origin, native speaker, how often you choose one thing over another, how often a word occurs in one register versus another • Continuous: height, weight, age, scores on a language test, IQ, working memory span • Ordinal (Rank Order): No fixed interval (first, second, third place in a race)—what order people choose their favorite dialect

  8. 1. Descriptive Statistics How could you depict the data for each of these types? • Nominal • Continuous • Ordinal (rank order)

  9. 1. Nominal (Categorical) Answers to “Where is this speaker from?” (native listeners)

  10. 1. Nominal (Categorical) correct dialect identification by American English speakers

  11. 2. Continuous

  12. Native listeners: status vs. solidarity Status RP Birmingham Network NYC West Yorkshire Alabama Solidarity RP Birmingham Network New York West Yorkshire Alabama

  13. 3. Ordinal (Rank Order) Coupland & Bishop, 2007

  14. 2. Inferential Statistics • Chi square • ANOVA/t-test • Correlations (rank order correlations) • Logical regression • Varbrul

  15. 2. Inferential Statistics For each type of statistics we need to know • Statistical value (chi value, F statistic, t statistic) • Probability value (p value) • Degrees of Freedom (df)

  16. 2. Inferential Statistics Research question: Is the word glistening used more often in one register (as shown in COCA) than another?

  17. 2. Inferential Statistics Research question: Is the word glistening used more often in one register (as shown in COCA) than another? What kind of data is this? Nominal (categorical) For this kind of data we use a chi square

  18. a. Chi-square Tells us whether something happened more often than chance would predict http://www-user.uni-bremen.de/~anatol/qnt/qnt_chi.html Use with multiple choice questions, percentage of time respondents choose specific choice, more corpora or frequency data

  19. a. Chi-square What chi-square statistic answers: Is the distribution into categories random or not? (Uses counts of nominal data) For example, multiple choice questions. Jill loves the taste of coffee. A-c[æ]fi-186 B-c[^]fi-113 C-c[a]fi-70 Is 186, 113, 70 really different from what random choice would give?

  20. a. Chi square To compute chi square, you need to know what is observed (the responses you got from your survey, corpus) and the expected frequencies. To calculate expected frequencies, you add up all the observed frequencies and divide by the number of data points Data point 1 Data point 2 Observed Expected

  21. 15 9 5 15 10 10 10 10 a. Chi-square (Invented) frequency of use of dude in four million word spoken corpora: US NZ AU UK 15 9 11 5 Random distribution would be: Observed (what the actually did) US NZ AU UK US NZ AU UK 10 10 10 10 Expected (what you would expect by random chance)

  22. We want this to be large We want this to be small a. Chi Square http://www.physics.csbsju.edu/stats/contingency_NROW_NCOLUMN_form.html chi-square = 2.77 degrees of freedom = 3probability = 0.429 The larger the chi value and the smaller the p value the more likely that the difference between the observed and the expected did not occur by chance

  23. a. Chi square Practice: Is the word glistening used more often in one register (as shown in COCA) than another? To do this, you need to times each number by 10 and use only whole numbers

  24. a. Chi Square Results: chi-square = 97.2 degrees of freedom = 4probability = 0.000

  25. a. Chi square More practice 1. Multiple choice question: Jill loves the taste of coffee. A-c[æ]fi-186 B-c[^]fi-113 C-c[a]fi-70 did respondents choose number A more often than the other two choices? 2. Identification: American Listeners choose the following choices when asked “where is this speaker from” (he was from Birmingham UK): London: 45% England: 25% Scotland: 25% Ireland: 5%

  26. Chi-square Homework

More Related