1 / 60

Email: mscardi@mclink.it URL: michele.scardi.name

The Fourth BIOSAFENET Seminar – January 12-15, 2009. International Centre for Genetic Engineering and Biosafety (ICGEB) Ca’ Tron di Roncade, Italy. Permutation tests in multivariate analysis Michele Scardi Department of Biology ‘Tor Vergata’ University Rome, Italy.

bill
Download Presentation

Email: mscardi@mclink.it URL: michele.scardi.name

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Fourth BIOSAFENET Seminar– January 12-15, 2009 International Centre for Genetic Engineering and Biosafety (ICGEB) Ca’ Tron di Roncade, Italy Permutation tests in multivariate analysisMichele ScardiDepartment of Biology‘Tor Vergata’ UniversityRome, Italy Email: mscardi@mclink.itURL: http://www.michele.scardi.name

  2. Ecology and agriculture A few reasons why studyingGMO effects on NTO is relevant to ecologists

  3. Agriculture and biodiveristy High Nature Value farmland Intensive farmland Intermediatedisturbancehypothesis! Modified from EEA Report 2/2006

  4. CAP and community ecology • In the past, Common Agricultural Policy mainly promoted the expansion of agricultural production. • A complete re-arrangement of the funding scheme now puts the environment at the center of farming policy. • Preserving biodiversity (and therefore monitoring agroecosystems) is one of the CAP main goals. • Biodiversity can only be monitored through community ecology studies. • Similar targets have been set in other fields (e.g. Water Framework Directive).

  5. EU and GMOs • In December 2008, the Council of the EU adopted the following conclusions on GMOs: “GMOs, … , give rise to discussion and questions, within the scientific community and society at large regarding their possible impact on health, environment and ecosystems.” • Therefore, there is a growing need for ecological research on agroecosystems.

  6. Permutation tests Because bugs don’t know they’re supposed to be normally distributed…

  7. A basic (univariate) permutation test • Sample A: 28, 32, 45 (mA=35.00) • Sample B: 22, 25, 29 (mB=25.33) • H0: no difference between means • Six data can be arranged in two groups of three in 20 different combinations:

  8. A basic (univariate) permutation test

  9. A basic (univariate) permutation test • Difference between Sample A and Sample B means ismA-mB=9.67 • By permuting our data we obtained an empirical distribution of the values of this difference acting as ifH0: equal meanswere true • How large is this original difference relative to the empirical distribution? • Is it so large that H0 is probably false?

  10. P-value cannot be less than: A basic (univariate) permutation test In 2+1casesout of 19+1:|mA-mB|  9.67 P(H0: mA=mB) = = (2+1)/(19+1) = = 0.15 > 0.05 We cannot rejectH0: mA=mB mA-mB

  11. A basic (univariate) permutation test

  12. What permutation tests do ecologist use? • ANOSIM, MRPP, NPMANOVA, Mantel Test and Indicator Species Analysis are quite popular • Other permutation have been developed to suit specific needs (e.g. testing significance of relationships between envrionmental variables and species in CCA) • If you have to solve a particular ecological problem, chances are you can develop your own permutation test

  13. Treatment A Treatment B Treatment C Treatment D 16 14 12 10 species 2 8 6 4 2 0 0 2 4 6 8 10 12 14 16 18 20 22 24 species 1 A toy multivariate problem (4 treatments and a community with only 2 species)…

  14. …and a toy permutation test • Differences between samples are measured as Euclidean distances • The test statistics is the the average distance between group centroids (i.e. average responses to treatments) • If the test statistics is large enough with respect to those obtained by data permutation, then differences between groups are significant.

  15. Species 2 Distances between average responses to treatments (between centroids) Species 1

  16. Mantel test X matrixe.g. geographical distances Y matrixe.g. species dissimilarity Problem: are X and Y (in)dependent of each other?

  17. Mantel statistics Range: [0,) Range: [-1,1] i.e. correlation between xij and yij

  18. Mantel statistics X Y Z=8.867

  19. Z=8.867 Mantel statistics X Y NB: Mantel statistics is maximum when large xij are multiplied by large yij, i.e. when the two matrices have the same structure

  20. Mantel test X actual value Z = 8.867 ZP1 = 8.365 ZP2 = 7.834 ZP3 = 6.897 ZP4 = 8.531 ZP5= 8.885 … ZPn = 7.852 permutations Y Z=8.867 ZP1=8.365

  21. Frequency Z Mantel test actual value actual value H0: X and Y are independent Z = 8.867 ZP1 = 8.365 ZP2 = 7.834 ZP3 = 6.897 ZP4 = 8.531 ZP5= 8.885 … ZPn = 7.852 permutations permutations 3% Z=8.867 p(Z)=0.03 Therefore, we reject H0

  22. Mantel test • The test can be performed on any couple of distance or similarity matrices, for instance: • two different groups of organisms • environmental data and species composition • etc. • In case the matrices are large enough (D>30) the test statistics is approximately distributed like Student’s t

  23. W and E basins are separated by two shallow sills Mantel test Years: 1969-1970Samples: 9 (W), 12 (E)Depth: 570-2550 m H0: structure ofdeep copepod assemblages is independent of basin W E But… More about this at the endof the presentation! Jaccard similarity Same basin? (1=yes, 0=no) W E W E W W H0 was rejected, concluding that (according to common opinion) Eastern and Western basins actually had differentdeep copepod assemblages R=0.37 p<0.001 E E

  24. A few more facts about Mantel test • Mantel test assumes that relationships between data in the two matrices are linear • Therefore, it is sensitive to non-linearity (as well as to outliers) • A rank-based Mantel test can be also performed, although it is not very popular • Partial Mantel correlation (e.g. using 3 matrices) can be computed • Mantel test can be performed on non-symmetrical matrices

  25. i.e. they can be used to arrange objects, samples, etc. in an Euclidean space Measuring ecological resemblance • Similarity coefficients • Range: [0,1] • S=1 when similarity between species lists is maximum • Dissimilarity coefficients • D=1-S • Range: [0,1] • D=0 when similarity between species lists is maximum • Some of them are metric • Distance coefficients • Range: [0,] or [0,Dmax] • D=0 when two species lists are identical (In some cases proportional) • Most of them are metric

  26. Measuring ecological resemblance • Legendre & Legendre (1983,1998) list 25 similarity coefficients and 14 metric distances, but many others have been used. • Different measures of ecological resemblance may lead to different results. • Selection of an optimal way of measuring ecological resemblance is inherently subjective.

  27. A smart pick: asymmetric similarity coefficients ignore absence data (they rely upon the lowest level of information, i.e. presence) A few examples… Jaccard Euclidean Sørensen Manhattan Steinhaus(Bray-Curtis) Canberra

  28. quantitative Manhattan Canberra treatmentor impact control treatmentor impact qualitative similaritieswithin groups Jaccard similaritiesbetween groups control

  29. ANOSIM(ANalysisOfSIMilarities) H0: mean rank of similarites between groups is equal to mean rank of similarities within groups, i.e. rb = rw Problem: are differences between groups large enough as to say that groups are different from each other?

  30. ANOSIM(ANalysisOfSIMilarities) N=6 sort n=6 n=9

  31. rw= 9.17 rw= 7.17 rw= 7.08 rw= 5.75 rb=9.50 rb=7.22 rb=8.56 rb=8.61 p(R)=0.10 R= 0.50 R= 0.20 R= 0.19 R= -0.26 ... n=6 n=9 n=6 n=9 n=6 n=9 n=6 n=9

  32. Final remarks about ANOSIM • In case of a posteriori pairwise comparisons, don’t forget the Bonferroni correction:p(R) = p(R) * n. of pairwise comparisons • Two-way analyses can also be performed • The most particular feature of ANOSIM is that it takes into account ranks of similarities (not sensitive to outliers, but maybe too sensitive to small differences)

  33. INSIDE OUTSIDE Problem: is the fish assemblage composition in a Marine Protected Area different from theneighbouring sites?

  34. Other approaches • ANOSIM is loosely related to ANOVA-like tests, but it can be easily integrated into explorative analyses based on ordination or clustering • Other approaches are more similar to ANOVA and to ANOVA users’ way of thinking: MRPP and NPMANOVA • Exactly like ANOVA, they support more complex experimental designs

  35. MRPP(MultiResponse Permutation Procedure) • It is based on actual distances (usually Euclidean), not ranks • The test statistic is Δ, a weighted average within group distance • Significance of Δ is assessed by the empirical distribution of permuted Δs, but an approximation to Student’s t is also available (quite handy for very large data sets) • There are different methods for weighting Δ according to group size

  36. MRPP • PRO: not using ranks, it can detect differences in mean values as well as differences in data dispersion • CON: biases due to distance metric are more pronounced than in ANOSIM • Output • observed Δ • significance of observed Δ • expected Δ • a ratio between observed and expected Δ

  37. MRPP treatment control DT=(8.2+5.4+5.7)/3=6.43 DC=(5.6+6.9+5.8)/3=6.10 Weight = ni/sum(ni) Expected D = overall average distance (i.e. between and within groups) = 11.32 D=6.43*3/6+6.10*3/6=6.26 distances within groups distances between groups Euclidean distances

  38. MRPP output ------------------------------------------------- MRPP -- Multi-Response Permutation Procedures [Michele Scardi, 1999] ------------------------------------------------- Input data: 6 objects, 10 variables, 2 groupsWeighting option: C(I) = n(I)/sum(n(I))Distance measure: Euclidean Group # 1, n = 3, avg(d) = 6.43 Group # 2, n = 3, avg(d) = 6.10 Test statistic, T = -2.95 Observed delta = 6.26 Expected delta = 11.32 Variance of delta = 2.94 Skewness of delta = -2.52 Within group agreement, R = 0.447 P-value of a <= delta = 0.022 groups are significantly more homegenous than expected (therefore, differences between groups are significant) R1 when D0 i.e. R is large when within group distance is small

  39. Arpaia et al. (2007). Composition of Arthropod Species Assemblages inBt-expressing and Near Isogenic Eggplants in Experimental Fields.Environ. Entomol. 36(1): 213-227

  40. NPMANOVA • Non-Parametric Multivariate ANalysis Of VAriance • Test statistics • SStotal = sum of squared distances between all observations and the overall centroid • SSwithin = sum of squared differences between group observations and group centroid • SSamong = SStotal-SSwithin • Pseudo-F = ratio of SSamong to SSwithin

  41. NPMANOVA SSamong SSwithin Species 2 Species 1

  42. NPMANOVA • Test is based on permutation of a distance matrix and computation of new Pseudo-F • Original Pseudo-F is then compared to the empirical distribution of permuted Pseudo-F values • Traditional ANOVA output (Pseudo-F) • One-way, two-ways and higher level ANOVA designs

  43. NPMANOVA • It can be applied any time ANOSIM or MRPP can be used, and it seems more robust • It allows higher level ANOVA designs, including designs with interactions • Any distance coefficient can be used • Like ANOVA in multiple regression, it can be used to evaluate model functioning: e.g. community structure=f(disturbance,time) • Software available from the Author’s web site for complex analysis designs

  44. NPMANOVA Non-parametric Multivariate Analysis of Variance Source df SS MS F P---------------------------------------------------Treatment 1 288.0 288.0 14.4 0.0981Residuals 4 80.0 20.0Total 5 368.0---------------------------------------------------

  45. NPMANOVA vs. MRPP SSamong SSwithin Species 2 Species 1

  46. NPMANOVA vs. MRPP expected D Species 2 observed D Species 1

  47. Indicator Species Analysis • Indicator Species Analysis allows identifiying species that are significantly more frequent and/or abundant in a group of samples • Each species is associated to a vector of Indicator Values (IVs), i.e. to an IV for each group of samples • Significance of IVs is tested by permutation of the raw data matrix

  48. Indicator Species Analysis Counts, biomass, etc. Relative abundance of species j in group k Average frequency of occurence ofspecies j in group k Presence or absence (0,1) The Indicator value (IV) is obtained by combining relative abundances (RA) and average frequencies of occurrence (RF)

More Related