Cold shitty weather. Sorry! Get warm with Glögg!

Cold shitty weather. Sorry! Get warm with Glögg!

Power recap

Power recap • It is good to fake data • BUT • p-values of 1 fake data is crap!

Power recap • A 1000 simulation Power analyses is not crap! • BUT • Power depends on: • Sample size • Effect size • Variation

Project considerations I • Make graphs • Check for outliers • Check assumptions • Decide if you want to transform y and or x • Check VIF • Are your assumtions still f**k*d up. •  Well, that’s for today.

Project considerations II • Interpret interactions first! • If they are significant: Are main effects still interpretable? • Distinguish between: y ~ x1 and y ~ x1 given x2 • – Simplify your models!

16 14 12 10 8 6 4 Red ants Black ants Logistic regression 2  2 tables Categoric 1.0 Melica 0.8 0.6 Prob. of choosing Melica 0.4 0.2 0.0 Response variable Luzula 4.5 5.5 6.5 7.5 Ant size Regression Anova Continuous - - Seed size Continuous Categoric Explanatory variable

16 14 12 10 8 6 4 Red ants Black ants Response variable Regression Anova Continuous - - Seed size Continuous Categoric Explanatory variable

Assumptions for parametric tests with continuous response i.e., also linear models!! About the same variation in all groups or along a continuous variable or along fitted values Pretty normal residuals (= noice)

The residuals… … are the noice that is not explained by the explanatory variable(s) In a regression the residuals are the distance from the data points to the regression line In an Anova the residual are the distance to the group mean In a linear model the residuals are the distance from the data points to the fitted values.

Residuals

Assumption check

Solutions • Poisson for counts (generalized linear model) • Non-parametric tests • Resampling methods • Permutation • Bootstrap • Binarize your response

quasipoisson fit on Xanthoria

Log Xanthoria apothecia

Poisson distribution • Response = Numbers (not true continuous) • Examples • Are there more maple seedlings close to a maple? • Response = number per square • m1<-glm(number~distance,family=Poisson)

Poisson distribution • Usually log(y) also works fine. • Poisson excells: • small means • many zeroes • Many zeroes  Hurdle models

break?

Non-parametric tests • Based on ranked values instead of actual data.

Non-parametric tests • Still often in use. • Questionable with modern computers. •  In principle permutions of ranked values •  But worse than ”real” permutations, because information about actual data values is discarded.

Non-parametric tests • Still often in use. • Questionable with modern computers. •  In principle: permutions of ranked values •  But worse (than ”real” permutations) because information about actual data values is discarded. BENEFIT: Calm dow outliers! 

16 14 12 10 8 6 4 Red ants Black ants Response variable Regression Anova 2 groups: also t-test Continuous - - Seed size Continuous Categoric Explanatory variable

16 14 12 10 8 6 4 Red ants Black ants Response variable Kendall rank correlation also: Spearman rank Kruskal-Wallis also: Mann-Whitney U-test Paired: Sign test(=binomial) Continuous - - Seed size Continuous Categoric Explanatory variable

Permutations • Does not require normal distribution • BUT, does require distributions to be equal if your hypothesis is not true. •  Example: • If the lichens are equally large in the city as they are at campus, they must have the same variation and e.g., skewness. >(cf. non-par!) • In principle a test of if the distributions differ.

Ash seed dispersal

Acer twigs – plasticity

Birch – cost of reproduction

break?

bootstrap • to pull oneself up by one's bootstraps • to succeed only on one's own effort or abilities.

shrimp-booting...

Rumex crispus Rumex longifolius 300 250 250 200 200 150 150 100 100 50 50 0 0 1.0 1.1 1.2 1.3 1.4 1.5 1.25 1.30 1.35 1.40

Confidence intervals • …shows how sure we are of a group mean. • The confidence interval will contain the ”true” mean in 95 % of the time. • The larger our sample size the more sure (= confident!) we are of our sample mean  the confidence interval decreases • And (of course…), the more variation within groups, the less sure we get  confidence interval increases

Bootstrap for tests 120 80 No. boot-samples 60 40 20 0 -5 0 5 10 15 20 25 boot.difference

Bootstrap • Does not require normal distribution of residuals. • Does not require the same variation. • Only requirement is that what you bootstrap (e.g., means) are the same if your hypothesis is not correct. • And, in practice, a large, representative sample

moss.shoot ~ forest type 2000 1500 1000 500 0 0 5 10 15 Bootstrapped difference in moss shoot length

Bootstrap • We use the functionsample(row.names(d),replace=T) • More advanced (and better):library(boot)?boot?boot.ci

Binarize your response • If all other efforts sucks: • Binarize your response • Nothing vs Something • Above the median vs Below the median • bin.y<-ifelse(y < median(y),0,1) • bin.y<-factor(bin.y) •  Then do a logistic regression, 2×2, or a generalized linear model

Friday Morning 08.00

Friday Lunch 10.30

Friday Afternoon 14.00

Mail me and your "opponent"! • Your handout/abstract • Before 14.30. • One page. Uno. Odjin. • Mail me your Powerpoint before 17.00 or bring it on USB memory stick. • Compress images to reduce size!

Max 15 min presentation Before you make your powerpoint: Watch this film: http://www.davidairey.com/how-not-to-use-powerpoint/

Mail me your data! • excel file • Help option booking list

Computer exercise • Use yor own data (if cont resp!). • Or old data. • Use either a continuous or categorical explanatory. • Possible also for many explanatories? • Non-parametric  Well, usually not • Permutation  Yes, but hard • Bootstrap  Yes, easy • Binarizing  Yes, easy

Exam • Read Learning goals • Read book in relation to learning goals • E.g., no GAM, Survival, Bayesian • Check lecture powerpoints in relation to learning goals • Practice on understanding the excercises (they ARE in the learning goals)

Lunch? or

Cold shitty weather. Sorry! Get warm with Glögg!