1 / 72

What we Measure vs. What we Want to Know

What we Measure vs. What we Want to Know. "Not everything that counts can be counted, and not everything that can be counted counts." - Albert Einstein. Scales, Transformations, Vectors and Multi-Dimensional Hyperspace.

britain
Download Presentation

What we Measure vs. What we Want to Know

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. What we Measure vs. What we Want to Know "Not everything that counts can be counted, and not everything that can be counted counts." - Albert Einstein

  2. Scales, Transformations, Vectors and Multi-Dimensional Hyperspace • All measurement is a proxy for what is really of interest - The Relationship between them • The scale of measurement and the scale of analysis and reporting are not always the same - Transformations • We often make measurements that are highly correlated - Multi-component Vectors

  3. Multivariate Description

  4. Gulls Variables

  5. Scree Plot

  6. Output > summary(gulls.pca2) Importance of components: Comp.1 Comp.2 Comp.3 Standard deviation 1.8133342 0.52544623 0.47501980 Proportion of Variance 0.8243224 0.06921464 0.05656722 Cumulative Proportion 0.8243224 0.89353703 0.95010425 > gulls.pca2$loadings Loadings: Comp.1 Comp.2 Comp.3 Comp.4Weight -0.505 -0.343 0.285 0.739Wing -0.490 0.852 -0.143 0.116Bill -0.500 -0.381 -0.742 -0.232H.and.B -0.505 -0.107 0.589 -0.622

  7. Bi-Plot

  8. Environmental Gradients

  9. Inferring Gradients from Attribute Data (e.g. species)

  10. Indirect Gradient Analysis • Environmental gradients are inferred from species data alone • Three methods: • Principal Component Analysis - linear model • Correspondence Analysis - unimodal model • Detrended CA - modified unimodal model

  11. Terschelling Dune Data

  12. PCA gradient - site plot

  13. PCA gradient - site/species biplot standard biodynamic& hobby nature

  14. Making Effective Use of Environmental Variables

  15. Approaches • Use single responses in linear models of environmental variables • Use axes of a multivariate dimension reduction technique as responses in linear models of environmental variables • Constrain the multivariate dimension reduction into the factor space defined by the environmental variables

  16. Dimension Reduction (Ordination) ‘Constrained’ by the Environmental Variables

  17. Constrained?

  18. Working with the Variability that we Can Explain • Start with all the variability in the response variables. • Replace the original observations with their fitted values from a model employing the environmental variables as explanatory variables (discarding the residual variability). • Carry our gradient analysis on the fitted values.

  19. Unconstrained/Constrained • Unconstrained ordination axes correspond to the directions of the greatest variability within the data set. • Constrained ordination axes correspond to the directions of the greatest variability of the data set that can be explained by the environmental variables.

  20. Direct Gradient Analysis • Environmental gradients are constructed from the relationship between species environmental variables • Three methods: • Redundancy Analysis - linear model • Canonical (or Constrained) Correspondence Analysis - unimodal model • Detrended CCA - modified unimodal model

  21. Dune Data Unconstrained

  22. Dune Data Constrained

  23. How Similar are Objects/Samples/Individuals/Sites?

  24. Similarity approachesor what do we mean by similar?

  25. Different types of data example Continuous data : height Categorical data ordered (nominal) : growth rate very slow, slow, medium, fast, very fast not ordered : fruit colour yellow, green, purple, red, orange Binary data : fruit / no fruit

  26. Different scales of measurement example Large Range : soil ion concentrations Restricted Range : air pressure Constrained : proportions Large numbers : altitude Small numbers : attribute counts Do we standardise measurement scales to make them equivalent? If so what do we lose?

  27. Similarity matrix We define a similarity between units – like the correlation between continuous variables. (also can be a dissimilarity or distance matrix) A similarity can be constructed as an average of the similarities between the units on each variable. (can use weighted average) This provides a way of combining different types of variables.

  28. A B A B Distance metrics relevant for continuous variables: Euclidean city block or Manhattan (also many other variations)

  29. 0,0 1,0 0,1 1,1 0,0 1,0 0,1 1,1 Similarity coefficients for binary data simple matching count if both units 0 or both units 1 Jaccard count only if both units 1 (also many other variants, eg Bray-Curtis) simple matching can be extended to categorical data

  30. A Distance Matrix

  31. Uses of Distances Distance/Dissimilarity can be used to:- • Explore dimensionality in data using Principal coordinate analysis (PCO or PCoA) • As a basis for clustering/classification

  32. UK Wet Deposition Network

  33. Grouping methods

  34. Cluster Analysis

  35. Clustering methods • hierarchical • divisive • put everything together and split • monothetic / polythetic • agglomerative • keep everything separate and join the most similar points (classical cluster analysis) • non-hierarchical • k-means clustering

  36. Agglomerative hierarchical Single linkage or nearest neighbour finds the minimum spanning tree: shortest tree that connects all points • chaining can be a problem

  37. Agglomerative hierarchical Complete linkage or furthest neighbour • compact clusters of approximately equal size. • (makes compact groups even when none exist)

  38. Agglomerative hierarchical Average linkage methods • between single and complete linkage

  39. From Alexandria to Suez

  40. Hierarchical Clustering

  41. Hierarchical Clustering

  42. Hierarchical Clustering

  43. Building and testing models Basically you just approach this in the same way as for multiple regression – so there are the same issues of variable selection, interactions between variables, etc. However the basis of any statistical tests using distributional assumptions are more problematic, so there is much greater use of randomisation tests and permutation procedures to evaluate the statistical significance of results.

  44. Some Examples

  45. Part of Fig 4.

More Related