What we Measure vs. What we Want to Know. "Not everything that counts can be counted, and not everything that can be counted counts." - Albert Einstein. Scales, Transformations, Vectors and Multi-Dimensional Hyperspace.
"Not everything that counts can be counted, and not everything that can be counted counts." - Albert Einstein
Importance of components:
Comp.1 Comp.2 Comp.3 Standard deviation 1.8133342 0.52544623 0.47501980 Proportion of Variance 0.8243224 0.06921464 0.05656722 Cumulative Proportion 0.8243224 0.89353703 0.95010425
Comp.1 Comp.2 Comp.3 Comp.4Weight -0.505 -0.343 0.285 0.739Wing -0.490 0.852 -0.143 0.116Bill -0.500 -0.381 -0.742 -0.232H.and.B -0.505 -0.107 0.589 -0.622
Continuous data : height
ordered (nominal) : growth rate
very slow, slow, medium, fast, very fast
not ordered : fruit colour
yellow, green, purple, red, orange
Binary data : fruit / no fruit
Large Range : soil ion concentrations
Restricted Range : air pressure
Constrained : proportions
Large numbers : altitude
Small numbers : attribute counts
Do we standardise measurement scales to make them equivalent? If so what do we lose?
We define a similarity between units – like the correlation between continuous variables.
(also can be a dissimilarity or distance matrix)
A similarity can be constructed as an average of the similarities between the units on each variable.
(can use weighted average)
This provides a way of combining different types of variables.
A Environmental Variables
relevant for continuous variables:
city block or Manhattan
(also many other variations)
0,0 Environmental Variables
1,1Similarity coefficients for binary data
count if both units 0 or both units 1
count only if both units 1
(also many other variants, eg Bray-Curtis)
simple matching can be extended to categorical data
Distance/Dissimilarity can be used to:-
Single linkage or nearest neighbour
finds the minimum spanning tree:
shortest tree that connects all points
Complete linkage or furthest neighbour
Average linkage methods
Basically you just approach this in the same way as for multiple regression – so there are the same issues of variable selection, interactions between variables, etc.
However the basis of any statistical tests using distributional assumptions are more problematic, so there is much greater use of randomisation tests and permutation procedures to evaluate the statistical significance of results.
Part of Fig 4. Environmental Variables
There are (at least) two models:-
Linear - species increase or decrease along the environmental gradient
Unimodal - species rise to a peak somewhere along the environmental gradient and then fall again
NMDS maps the observed dissimilarities onto an ordination space by trying to preserve their rank order in a low number of dimensions (often 2) – but the solution is linked to the number of dimensions chosen
it is like a non-linear version of PCO
define a stress function and look for the mapping with minimum stress
(e.g. sum of squared residuals in a monotonic regression of NMDS space distances between original and mapped dissimilarities)
need to use an iterative process, so try with many different starting points and convergence is not guaranteed
used to compare graphically two separate ordinations