510 likes | 615 Views
Embrace the cozy spirit with Glögg and power analyses! Learn about the importance of data integrity, sample size, effect size, and variation in statistical analysis projects. From making graphs to dealing with outliers, ensure your models are robust for accurate interpretations. Explore regression, Anova, logistic regression, and more while checking assumptions and addressing outliers. Delve into non-parametric tests, Poisson distribution, and the significance of assumptions in parametric tests. Discover the utility of confidence intervals, bootstrap methods, and practical considerations in statistical analyses.
E N D
Cold shitty weather. Sorry! Get warm with Glögg!
Power recap • It is good to fake data • BUT • p-values of 1 fake data is crap!
Power recap • A 1000 simulation Power analyses is not crap! • BUT • Power depends on: • Sample size • Effect size • Variation
Project considerations I • Make graphs • Check for outliers • Check assumptions • Decide if you want to transform y and or x • Check VIF • Are your assumtions still f**k*d up. • Well, that’s for today.
Project considerations II • Interpret interactions first! • If they are significant: Are main effects still interpretable? • Distinguish between: y ~ x1 and y ~ x1 given x2 • – Simplify your models!
16 14 12 10 8 6 4 Red ants Black ants Logistic regression 2 2 tables Categoric 1.0 Melica 0.8 0.6 Prob. of choosing Melica 0.4 0.2 0.0 Response variable Luzula 4.5 5.5 6.5 7.5 Ant size Regression Anova Continuous - - Seed size Continuous Categoric Explanatory variable
16 14 12 10 8 6 4 Red ants Black ants Response variable Regression Anova Continuous - - Seed size Continuous Categoric Explanatory variable
Assumptions for parametric tests with continuous response i.e., also linear models!! About the same variation in all groups or along a continuous variable or along fitted values Pretty normal residuals (= noice)
The residuals… … are the noice that is not explained by the explanatory variable(s) In a regression the residuals are the distance from the data points to the regression line In an Anova the residual are the distance to the group mean In a linear model the residuals are the distance from the data points to the fitted values.
Solutions • Poisson for counts (generalized linear model) • Non-parametric tests • Resampling methods • Permutation • Bootstrap • Binarize your response
Poisson distribution • Response = Numbers (not true continuous) • Examples • Are there more maple seedlings close to a maple? • Response = number per square • m1<-glm(number~distance,family=Poisson)
Poisson distribution • Usually log(y) also works fine. • Poisson excells: • small means • many zeroes • Many zeroes Hurdle models
Non-parametric tests • Based on ranked values instead of actual data.
Non-parametric tests • Still often in use. • Questionable with modern computers. • In principle permutions of ranked values • But worse than ”real” permutations, because information about actual data values is discarded.
Non-parametric tests • Still often in use. • Questionable with modern computers. • In principle: permutions of ranked values • But worse (than ”real” permutations) because information about actual data values is discarded. BENEFIT: Calm dow outliers!
16 14 12 10 8 6 4 Red ants Black ants Response variable Regression Anova 2 groups: also t-test Continuous - - Seed size Continuous Categoric Explanatory variable
16 14 12 10 8 6 4 Red ants Black ants Response variable Kendall rank correlation also: Spearman rank Kruskal-Wallis also: Mann-Whitney U-test Paired: Sign test(=binomial) Continuous - - Seed size Continuous Categoric Explanatory variable
Permutations • Does not require normal distribution • BUT, does require distributions to be equal if your hypothesis is not true. • Example: • If the lichens are equally large in the city as they are at campus, they must have the same variation and e.g., skewness. >(cf. non-par!) • In principle a test of if the distributions differ.
bootstrap • to pull oneself up by one's bootstraps • to succeed only on one's own effort or abilities.
Rumex crispus Rumex longifolius 300 250 250 200 200 150 150 100 100 50 50 0 0 1.0 1.1 1.2 1.3 1.4 1.5 1.25 1.30 1.35 1.40
Confidence intervals • …shows how sure we are of a group mean. • The confidence interval will contain the ”true” mean in 95 % of the time. • The larger our sample size the more sure (= confident!) we are of our sample mean the confidence interval decreases • And (of course…), the more variation within groups, the less sure we get confidence interval increases
Bootstrap for tests 120 80 No. boot-samples 60 40 20 0 -5 0 5 10 15 20 25 boot.difference
Bootstrap • Does not require normal distribution of residuals. • Does not require the same variation. • Only requirement is that what you bootstrap (e.g., means) are the same if your hypothesis is not correct. • And, in practice, a large, representative sample
moss.shoot ~ forest type 2000 1500 1000 500 0 0 5 10 15 Bootstrapped difference in moss shoot length
Bootstrap • We use the functionsample(row.names(d),replace=T) • More advanced (and better):library(boot)?boot?boot.ci
Binarize your response • If all other efforts sucks: • Binarize your response • Nothing vs Something • Above the median vs Below the median • bin.y<-ifelse(y < median(y),0,1) • bin.y<-factor(bin.y) • Then do a logistic regression, 2×2, or a generalized linear model
Mail me and your "opponent"! • Your handout/abstract • Before 14.30. • One page. Uno. Odjin. • Mail me your Powerpoint before 17.00 or bring it on USB memory stick. • Compress images to reduce size!
Max 15 min presentation Before you make your powerpoint: Watch this film: http://www.davidairey.com/how-not-to-use-powerpoint/
Mail me your data! • excel file • Help option booking list
Computer exercise • Use yor own data (if cont resp!). • Or old data. • Use either a continuous or categorical explanatory. • Possible also for many explanatories? • Non-parametric Well, usually not • Permutation Yes, but hard • Bootstrap Yes, easy • Binarizing Yes, easy
Exam • Read Learning goals • Read book in relation to learning goals • E.g., no GAM, Survival, Bayesian • Check lecture powerpoints in relation to learning goals • Practice on understanding the excercises (they ARE in the learning goals)
Lunch? or