INVESTIGATION OF MAIN CONTAMINATION SOURCES OF HEAVY METAL IONS IN FISH, SEDIMENTS, AND WATERS FROM CATALONIA RIVERS USING DIFFERENT MULTIWAY DATA ANALYSIS METHODS. Emma Peré-Trepat 1 and Romà Tauler 2 *
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
INVESTIGATION OF MAIN CONTAMINATION SOURCES OF HEAVY METAL IONS IN FISH, SEDIMENTS, AND WATERS FROM CATALONIA RIVERS USING DIFFERENT MULTIWAY DATA ANALYSIS METHODS
Emma Peré-Trepat1 and Romà Tauler 2*
1 Dept. of Analytical Chemistry, Universitat de Barcelona, Diagonal 647, 08028 Barcelona, Spain
2 IIQAB-CSIC, Jordi Girona 18-26, 08034 Barcelona, Spain
* e-mail: [email protected]
Environmental data tables (two-way data)
350
350
300
300
250
250
200
200
150
150
100
100
50
50
0
0
-50
-50
0
5
10
15
20
25
30
0
5
10
15
20
25
30
35
40
45
50
J variables
Conc. of chemicals
Physical Properties
Biological properties
Other .....
<LOD
Data table or
matrix
I samples
12 13 45 67 89 42 35 0 0.3 0.005 111 33 5 67 90 0.06 44 33 1 2
X
‘m’
Plot of variables
(columns)
Plot of samples
(rows)
Environmental three-way data sets
Measured data usually consisted on concentrations of different
chemical compounds (variables) measured in different samples
at different times/situations/conditions/compartments.
Data are ordered in a two-way or in a three-way data table according
to their structure
3-way data sets
time/
compartment
samples
variables (conc. Chemical ompounds)
Chemometric models to describe environmental measurements
Chemometric models to describe environmental measurements
Bilinear models for two way data:
J
dij
I
D
dijis the concentration of chemical contaminant j in sample i
n=1,...,N are a reduced number of independent environmental sources
xin is the amount of source n in sample i;
ynjis the amount of contaminant j in source n
Chemometric models to describe environmental measurements
Bilinear models for two way data:
J
J
J
YT
N
D
E
X
I
+
I
I
N << I or J
N
PCA
X orthogonal, YT orthonormal
YT in the direction of maximum
variance
Unique solutions
but without physical meaning
Identification and Intereprtation!
MCR-ALS
X and YTnon-negative
X or YT normalization
other constraints (unimodality,
local rank,… )
Non-unique solutions
but with physical meaning
Resolution and apportionment!
Chemometric models to describe environmental measurements
Extension of Bilinear models for simultaneous analysis of multiple two way data sets
YT
Xaug
Dk
Xk
(n,J)
YT
(I x J)
(I,n)
Xk
Dk
PCA: orthogonality; max. variance
MCR: non-negativity, nat. constraints
Matrix
augmentation
strategy
Daug
YT
Dk
Xk
(n,J)
(I x J)
(I,n)
Environmental data sets
Chemometric models to describe environmental measurements
i=1,...,I
k=1,...,K
j=1,...,J
Trilinear models for three-way data:
Dk
dijkis the concentration of chemical contaminant j in sample I at time (condition) k
n=1,...,N are a reduced number of independent environmental sources
xin is the amount of source n in sample i;
ynjis the amount of contaminant j in source n
znk is the contribution of source n to compartment k
variables
Nj
Nk
Ni
Z-mode
Z
X-mode
samples
X
Y
D
K
conditions
(I , J , K)
I
J
Y-mode
Three Way data models
Z
X
YT
=
D
PARAFAC (trilinear model)
The same number of components In the three modes: Ni = Nj = Nk = N
No interactions between components
Different slices Xk are decomposed In bilinear profiles having the same shape!
Z
G
YT
=
X
D
Tucker3 models
In PARAFAC Ni = Nj = Nk = N and
core array G is a superdiagonal identity cube
Guidelines for method selection
(resolution purposes)
Deviations
from trilinearity Mild Medium Strong
Array size
PARAFAC
SmallPARAFAC2
MediumTUCKER
LargeMCR, PCA, SVD,..
Journal of Chemometrics, 2001, 15, 749-771
INTEGRATION OF CHEMOMETRICS—GEOSTATISTICS
(Geographical
Information
Systems, GIS)
1
2
3
6
5
4
7
17
9
8
10
11
12
13
14
15
16
METAL CONTAMINATION SOURCES IN SEDIMENTS, FISH AND WATERS FROM CATALONIA RIVERS USING MULTIWAY DATA ANALYSIS METHODS
Emma Peré-Trepat (UB), Mónica Flo, Montserrat Muñoz, Antoni Ginebreda (ACA), Marta Terrado, Romà Tauler (CSIC)
France
Pyrinees
1. RIU MUGA Castelló d´Empúries J052
2. RIU FLUVIÀ Besalú J022
3. RIU FLUVIÀ L´Armentera J011
4. RIU TER Manlleu J034
5. RIU TERRI Sant Julià de Ramis J028
6. RIU TER Clomers J112
7. RIU TORDERA Fogars de Tordera J062
8. RIU CONGOST La Garriga J037
9. RIU LLOBREGAT El Pont de Vilomara J031
10. RIU CARDENER Castellgali J002
11. RIU LLOBREGAT Abrera J084
12. RIU LLOBREGAT Martorell J005
13. RIU LLOBREGAT Sant Joan Despí J049
14. RIU FOIX Castellet J008
15. RIU FRANCOLÍ La Masó J059
16. RIU EBRE Flix J056
17. RIU SEGRE Térmens J207
Aragón
Barcelona
Mediterranean Sea
17 rivers, 11 metals (As, Ba, Cd, Co, Cu, Cr, Fe, Mn, Ni, Pb, Zn),
3 environmental conpartments: Fish (barb’, ‘bagra comuna’, bleak, carp and
trout), Sediment and Water samples
**
300
250
200
Values
150
100
***
50
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
Column Number
outliers
upper whisker
upper quartile
median
lower quartile
lower whisker
outliers
Effect of different data pre-treatments: Sediment samples
raw
mean-
centred
auto-
scaled
scaled
Mo is eliminated
As Ba Cd Co Cu Cr Fe Mn Ni Pb Zn
Description of scaled data
Metal distribution in the three compartments
Cd, Co and Ld in water
were not scaled; only downweigthed
metals (variables)
Description of scaled data:
different sites in the three compartments
Llobregat
Tordera
Segre
Ter
Llobregat
Foix
Congost
Cardener
Fluvià
Muga
Llobregat
Terri
Ebre
Francolí
Ter
Fluvià
Llobregat
sample sites
Unit variance scaled concentrations boxplot
Fish
4
Values
2
0
1
2
3
4
5
6
7
8
9
10
11
Sediment
4
Values
2
0
1
2
3
4
5
6
7
8
9
10
11
6
Water
4
Values
2
0
1
2
3
4
5
6
7
8
9
10
11
As Ba Cd Co Cu Cr Fe Mn Ni Pb Zn
Fish
Fish
Sediment
Water
Sediment
Water
compartments
sites
AUGMENTATION direction
column row tube
s1 40.2619 43.2553 41.3302
s2 16.7504 9.2823 19.4850
s3 9.4963 8.5312 14.3739
contaminants
Fish
Sediment
Water
SVD odf augmented data matrices in the three-directions
45
40
svd column-wise (variables)
svd row-wise(samples)
35
svd trube-wise (type)
30
2nd component
25
THREE-WAY DATA ARRAY MATRICIZING
or MATRIX AUGMENTATION
20
15
10
How many components
are needed to explain
each mode?
5
0
0
1
2
3
4
5
6
7
8
9
10
compartments
sites
metals
Bilinear modelling of three-way data
(Matrix Augmentation or matricizing, stretching, unfolding )
MA-PCA
MA-MCR-ALS
contaminants
Y
sites
4
F
1
F
Loadings
S
W
5
S
2
sites
sites
6
W
3
Daug
Xaug
Augmented
scores matrix
Augmented
data matrix
Explained variances using bilinear models
(profiles in two modes)
As
As
Ba
Ba
Cd
Cd
Co
Co
Cu
Cu
Cr
Cr
Fe
Fe
Mn
Mn
Ni
Ni
Pb
Pb
Zn
Zn
metals
metals
MA-PCA of scaled data without scores refolding
10
8
6
4
2
0
0
5
10
15
20
25
30
35
40
45
50
5
water
samples
0
sediment and fish
samples
Ba
As
Cu
Zn
-5
0
5
10
15
20
25
30
35
40
45
50
water
soluble
metal ions
MA-PCA
As
As
Ba
Ba
Cd
Cd
Co
Co
Cu
Cu
Cr
Cr
Fe
Fe
Mn
Mn
Ni
Ni
Pb
Pb
Zn
Zn
metals
metals
MA-MCR-ALS of scaled data with nn and without scores refolding
10
sediment and fish
samples
Ba
8
Zn
Cu
6
As
4
2
0
0
5
10
15
20
25
30
35
40
45
50
10
8
water
samples
6
4
2
0
0
5
10
15
20
25
30
35
40
45
50
More easily
Interpretable!!!
MA-MCR-ALS
MA-PCA
Calculation of the boundaries of feasible band solutions
(Journal of Chemometrics, 2001, 15, 627-646)
max
min
Nearly no rotation ambiguities are present in non-negative environmental profiles calculated by MCR-ALS
(very different to spectroscopy!!!!!)
Bilinear modelling of three-way data
(Matrix Augmentation or matricizing, stretching, unfolding )
Xaug
contaminants
Y
sites
F
1
4
F
S
PCA
MCR-ALS
W
5
S
2
sites
contaminants
X
Y
sites
6
W
3
sites
xi
xii
Z
zi
zii
D
compartments (F,S,W)
zi
compartments
SVD
sites
1
2
3
xi
zii
contaminants
SVD
4
5
6
xii
Scores
refolding
strategy!!!
(applied only
to final
augmented
Scores)
Loadings
recalculation
in two modes
from augmented
scores
Explained variances using trlinear models
(profiles in three modes)
0.5
0.4
0.3
0.2
0.1
0
As
Ba
Cd
Co
Cu
Cr
Fe
Mn
Ni
Pb
Zn
metals
0.5
0
-0.5
As
Ba
Cd
Co
Cu
Cr
Fe
Mn
Ni
Pb
Zn
metals
MA-PCA of scaled data with nn and scores refolding
Little differences in
samples mode!!!
MA-PCA + refolding
MA-PCA
MA-MCR-ALS of scaled data with scores refolding
MA-MCR-ALS + refolding
MA-MCR-ALS
Z
compartments
(F,S,W)
metals
F
metals
compartments (F,S,W)
Y
S
W
PARAFAC
sites
sites
D
X
compartments
sites
contaminants
Trilinear modelling of three-way data
PARAFAC of scaled data
PARAFAC
MA-PCA (bilinear)
MA-MCR-ALS
Trilinear constraint
compartments
sites
contaminants
Xaug
contaminants
Y
sites
F
1
contaminants
F
X
Y
S
W
S
MCR-ALS
2
sites
sites
Z
compartments (F,S,W)
sites
W
3
D
Substitution of
species profile
Selection of species profile
TRILINEARITY CONSTRAINT
(ALS iteration step)
1
1’
This constraint
is applied at each step
of the ALS optimization
and independently
for each component
individually
Rebuilding augmented scores
SVD
Folding
2
2’
Loadings
recalculation
in two modes
from augmented
scores
every augmented
scored wnated to
follow the trilinear
model is refolded
3
3’
10
8
6
4
2
0
0
5
10
15
20
25
30
35
40
45
50
10
8
As
As
Ba
Ba
Cd
Cd
Co
Co
Cu
Cu
Cr
Cr
Fe
Fe
Mn
Mn
Ni
Ni
Pb
Pb
Zn
Zn
metals
6
4
2
0
0
5
10
15
20
25
30
35
40
45
50
MA-MCR-ALS of scaled data with nn, trilinearity (without scores refolding)
MA-MCR-ALS nn + trilinear
MA-MCR-ALS nn
Calculation of the boundaries of feasible band solutions
(Journal of Chemometrics, 2001, 15, 627-646)
No rotation ambiguities are present in trilinear non-negative environmental profiles calculated by MCR-ALS
(very different to spectroscopy!!!!!)
MA-MCR-ALS of scaled data with nn, trilinearity and with scores refolding
MA-MCR-ALS nn + trilinear
PARAFAC nn
Comparison PARAFAC vs MCR-ALS (trilinearity)
Z
compartments
(F,S,W)
F
metals
compartments (F,S,W)
metals
S
2
Y
2
1
W
TUCKER3
2
=
sites
1
D
sites
2
G
Model (1,2,2)
X
compartments
sites
metals
Tucker3 modelling of three-way data
Tucker Models with non-negativity
constraints
[2 3 3]
[3 3 3]
[1 3 3]
[3 2 3]
[2 2 2] [2 2 3]
[1 2 2] [1 2 3]
parsimonious model
[1 2 2]
Tucker3 of scaled data
0.4
1
1
0.2
0.5
0.5
0
0
0
0
5
10
15
1
2
3
4
5
6
7
8
9
10
11
1
2
3
1
1
0.5
0.5
0
0
1
2
3
4
5
6
7
8
9
10
11
1
2
3
TUCKER3
PARAFAC
model [1 2 2]
model [2 2 2]
compartments
sites
contaminants
MA-MCR-ALS
Tucker3 constraint
Xaug
metals
Y
sites
F
1
4
F
X
S
Y
W
S
MCR-ALS
=
2
5
sites
Z
compartments (F,S,W)
sites
W
3
6
Loadings
recalculation
in two modes
from augmented
scores
D
Tucker3 CONSTRAINT
(ALS iteration step)
1’
4’
Folding
SVD
=
=
1
2
3
4
5
6
2’
5’
This constraint is applied at each step of the ALS optimization
and independently and individually for each component i
interacting augmented
scores are folded
together
3’
6’
MA-MCR-ALS of scaled data with nn, tucker3 (without scores refolding)
10
8
6
4
2
0
0
5
10
15
20
25
30
35
40
45
50
10
8
6
4
2
0
0
5
10
15
20
25
30
35
40
45
50
model [1 2 2]
model [2 2 2]
MA-MCR-ALS nn + Tucker3
MA-MCR-ALS nn + PARAFAC
MA-MCR-ALS of scaled data with nn, tucker3 and with scores refolding
MA-MCR-ALS nn + Tucker3
Tucker3
model [1 2 2]
model [1 2 2]
Summary of Results
INTEGRATION OF CHEMOMETRICS-GEOSTATISTICS (Geographical Information
Systems, GIS)
(67.3%)
(13.2%)
INTEGRATION OF CHEMOMETRICS-GEOSTATISTICS (Geographical Information
Systems, GIS)
(67.3%)
(13.2%)
INTEGRATION OF CHEMOMETRICS-GEOSTATISTICS (Geographical Information
Systems, GIS)
(67.3%)
(13.2%)
Conclusions
Chemometric methods allow resolution of environemtal sources of chemical contaminants
However we should we aware of how every method displays the information because the mathematical properties of the used method are different (i.e. orthogonality vs non-negativity, bilinearity vs trilinearity, nr. of components...)
This interpretation and resolution of environmental sources is not easy because the contamination sources in real world are correlated and because of experimental data limitations (environmental sources should show variation in the investigated data set).
Bilinear PCA and MCR-ALS can be used to study multiway data sets and compared with multiway methods (like PARAFAC and Tucker if appropriate scores refolding is performed)
Bilinear non-negative MCR-ALS solutions may provide good approximation of the real sources because non-negative environmental profiles have little rotation ambiguity
Conclusions
PARAFAC and Tucker3 may provide simpler models and they are special useful for trilinear data or when not the same number of components are present in the different modes.
Intermediate situations between pure bilinear and pure trilinear models can be easily implemented in MCR-ALS
Bilinear based models are more flexible than trilinear based models to resolve ‘true’ sources of data variation
Different number of components and interactions between components in different modes (constraint under development) can be considered in mixed bilinear-trilinear-Tucker MA-MCR models
For an optimal RESOLUTION, the model should be in accordance with the 'true' data structure
Integration of Chemometrics-GIS results may facilitate geographical and temporal interpretation of contamination sources and they correlation with land uses, population and industrial activities