Loading in 5 sec....

Emma Peré-Trepat 1 and Romà Tauler 2 *PowerPoint Presentation

Emma Peré-Trepat 1 and Romà Tauler 2 *

- 81 Views
- Uploaded on
- Presentation posted in: General

Emma Peré-Trepat 1 and Romà Tauler 2 *

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

INVESTIGATION OF MAIN CONTAMINATION SOURCES OF HEAVY METAL IONS IN FISH, SEDIMENTS, AND WATERS FROM CATALONIA RIVERS USING DIFFERENT MULTIWAY DATA ANALYSIS METHODS

Emma Peré-Trepat1 and Romà Tauler 2*

1 Dept. of Analytical Chemistry, Universitat de Barcelona, Diagonal 647, 08028 Barcelona, Spain

2 IIQAB-CSIC, Jordi Girona 18-26, 08034 Barcelona, Spain

* e-mail: rtaqam@iiqab.csic.es

- Outline:
- Introduction and motivations of this work
- Environmental data tables and chemometrics models and methods
- Example of application: metal contamination sources in fish, sediment and surface water river samples.
- Conclusions

- Introduction and motivations of this work
- Pollution and toxicological chemical compounds are a threat for the environment and the health which need urgent measures and actions
- Environmental monitoring studies produce huge amounts of multivariate data ordered in large data tables (data matrices)
- The bottle neck in the study of these environmental data tables is their analysis and interpretation
- There is a need for chemometrics (statistical and numerical analysis of multivariate chemical data) analysis of these data tables!

- What kind of information can be obtained from chemometric analysis of environmental multivariate data tables?
- Detection, identification, interpretation and resolution of the main sources of contamination
- Distribution of these contamination sources in the environment: geographically, temporally, by environmental compartment (air, water, sediments, biota,...),…
- Distinction between point and diffuse contamination sources sources
- Quantitative apportionment of these sources .....

- Introduction and motivations of this work
- In this work different chemometric multiway data analysis
- methods are compared for the resolution of the
- environmental sources of 11 metal ions in 17 river
- samples of fish, sediment and water at the same site
- locations of Catalonia (NE, Spain).
- Two-way bilinear model based methods
- MA-PCA Matrix Augmentation Principal Component Analysis
- MA-MCR-ALS Matrix Augmentation Multivariate Curve Resolution Alternating Least Squares

- PARAFAC
- TUCKER3
- MCR-ALS trilinear
- MCR-ALS TUCKER3

- Introduction and motivations of this work
- Special attention will be paid to:
- Finding ways to compare results obtained using bilinear and trilinear models for three-way data: getting profiles in three modes from bilinear models of three-way data
- Adaptation of MCR-ALS to the fulfillment of PARAFAC and TUCKER3 trilinear models
- Reliability of solutions: calculation of boundaries of bands of feasible solutions
- Integration of Geostatistics and Chemometrics in the investigation of environmental data

- Outline:
- Introduction and motivations of this work
- Environmental data tables and chemometrics models and methods
- Example of application: metal contamination sources in fish, sediment and river surface water samples.
- Conclusions

Environmental data tables (two-way data)

350

350

300

300

250

250

200

200

150

150

100

100

50

50

0

0

-50

-50

0

5

10

15

20

25

30

0

5

10

15

20

25

30

35

40

45

50

J variables

Conc. of chemicals

Physical Properties

Biological properties

Other .....

<LOD

Data table or

matrix

I samples

12 13 45 67 89 42 35 0 0.3 0.005 111 33 5 67 90 0.06 44 33 1 2

X

‘m’

Plot of variables

(columns)

Plot of samples

(rows)

Environmental three-way data sets

Measured data usually consisted on concentrations of different

chemical compounds (variables) measured in different samples

at different times/situations/conditions/compartments.

Data are ordered in a two-way or in a three-way data table according

to their structure

3-way data sets

time/

compartment

- Three measurement modes
- - variables mode
- sample mode
- times/situations/conditions/ compartments mode

samples

variables (conc. Chemical ompounds)

Chemometric models to describe environmental measurements

- Models for what?
- Models for:
- identification of contamination sources?
- exploration of contamination sources?
- interpretation of contamination sources?
- resolution of environmental source?
- apportionment/quantitation of environmental source?
- ??????..............................

Chemometric models to describe environmental measurements

Bilinear models for two way data:

J

dij

I

D

dijis the concentration of chemical contaminant j in sample i

n=1,...,N are a reduced number of independent environmental sources

xin is the amount of source n in sample i;

ynjis the amount of contaminant j in source n

Chemometric models to describe environmental measurements

Bilinear models for two way data:

J

J

J

YT

N

D

E

X

I

+

I

I

N << I or J

N

PCA

X orthogonal, YT orthonormal

YT in the direction of maximum

variance

Unique solutions

but without physical meaning

Identification and Intereprtation!

MCR-ALS

X and YTnon-negative

X or YT normalization

other constraints (unimodality,

local rank,… )

Non-unique solutions

but with physical meaning

Resolution and apportionment!

Chemometric models to describe environmental measurements

Extension of Bilinear models for simultaneous analysis of multiple two way data sets

YT

Xaug

Dk

Xk

(n,J)

YT

(I x J)

(I,n)

Xk

Dk

PCA: orthogonality; max. variance

MCR: non-negativity, nat. constraints

Matrix

augmentation

strategy

Daug

YT

Dk

Xk

(n,J)

(I x J)

(I,n)

Environmental data sets

Chemometric models to describe environmental measurements

i=1,...,I

k=1,...,K

j=1,...,J

Trilinear models for three-way data:

Dk

dijkis the concentration of chemical contaminant j in sample I at time (condition) k

n=1,...,N are a reduced number of independent environmental sources

xin is the amount of source n in sample i;

ynjis the amount of contaminant j in source n

znk is the contribution of source n to compartment k

variables

Nj

Nk

Ni

Z-mode

Z

X-mode

samples

X

Y

D

K

conditions

(I , J , K)

I

J

Y-mode

Three Way data models

Z

X

YT

=

D

PARAFAC (trilinear model)

The same number of components In the three modes: Ni = Nj = Nk = N

No interactions between components

Different slices Xk are decomposed In bilinear profiles having the same shape!

Z

G

YT

=

- Different number of components
- in the different modes Ni Nj Nk
- Interaction between components
- in different modes is possible

X

D

Tucker3 models

In PARAFAC Ni = Nj = Nk = N and

core array G is a superdiagonal identity cube

Guidelines for method selection

(resolution purposes)

Deviations

from trilinearity Mild Medium Strong

Array size

PARAFAC

SmallPARAFAC2

MediumTUCKER

LargeMCR, PCA, SVD,..

Journal of Chemometrics, 2001, 15, 749-771

INTEGRATION OF CHEMOMETRICS—GEOSTATISTICS

(Geographical

Information

Systems, GIS)

- Outline:
- Introduction and motivations of this work
- Environmental data tables
- Chemometrics bilinear and trilinear models and methods
- Example of application: metal contamination sources in fish, sediment and river surface water samples.
- Conclusions

1

2

3

6

5

4

7

17

9

8

10

11

12

13

14

15

16

METAL CONTAMINATION SOURCES IN SEDIMENTS, FISH AND WATERS FROM CATALONIA RIVERS USING MULTIWAY DATA ANALYSIS METHODS

Emma Peré-Trepat (UB), Mónica Flo, Montserrat Muñoz, Antoni Ginebreda (ACA), Marta Terrado, Romà Tauler (CSIC)

France

Pyrinees

1. RIU MUGA Castelló d´Empúries J052

2. RIU FLUVIÀ Besalú J022

3. RIU FLUVIÀ L´Armentera J011

4. RIU TER Manlleu J034

5. RIU TERRI Sant Julià de Ramis J028

6. RIU TER Clomers J112

7. RIU TORDERA Fogars de Tordera J062

8. RIU CONGOST La Garriga J037

9. RIU LLOBREGAT El Pont de Vilomara J031

10. RIU CARDENER Castellgali J002

11. RIU LLOBREGAT Abrera J084

12. RIU LLOBREGAT Martorell J005

13. RIU LLOBREGAT Sant Joan Despí J049

14. RIU FOIX Castellet J008

15. RIU FRANCOLÍ La Masó J059

16. RIU EBRE Flix J056

17. RIU SEGRE Térmens J207

Aragón

Barcelona

Mediterranean Sea

17 rivers, 11 metals (As, Ba, Cd, Co, Cu, Cr, Fe, Mn, Ni, Pb, Zn),

3 environmental conpartments: Fish (barb’, ‘bagra comuna’, bleak, carp and

trout), Sediment and Water samples

- Missing data (‘m’)
- Unknown values produce empty holes in data matrices
- When they are few and they are evenly distributed, they
- may be estimated by PCA imputation (or other method)
- Below LOD values (<LOD)
- This a common problem in environmental data tables
- If most of the values are below LOD, data matrices are sparse
- For calculations, it is better, either to use the experimental values or set them to LOD/2 instead of to zero or to LOD

- Preliminary data description: Use of descriptive statistics
- Individual sample plots
- Individual variable plots
- Descriptive statistics (Excel Statistics)
- Histograms/Box plots
- Binary correlation between variables
- 5) .............................................................

**

300

250

200

Values

150

100

***

50

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

Column Number

outliers

upper whisker

upper quartile

median

lower quartile

lower whisker

outliers

Effect of different data pre-treatments: Sediment samples

raw

mean-

centred

auto-

scaled

scaled

Mo is eliminated

As Ba Cd Co Cu Cr Fe Mn Ni Pb Zn

- No mean-centering was applied to allow an improved physical interpretation of factors (application of non-negativity constraints instead of orthogonality constraints) and the comparison of results using MCR-ALS methods
- Two scaling possibilities:
- First, data matrix augmentation and then column scaling to equal variance (each column element divided by its standard deviation)
- First, column scaling each data matrix separately and then data matrix augmentation

- Variables with nearly no-changes and equal or close to their limit of detection were removed from scaling and divided by 20 (to avoid their miss-overweighting)

Description of scaled data

Metal distribution in the three compartments

Cd, Co and Ld in water

were not scaled; only downweigthed

metals (variables)

Description of scaled data:

different sites in the three compartments

Llobregat

Tordera

Segre

Ter

Llobregat

Foix

Congost

Cardener

Fluvià

Muga

Llobregat

Terri

Ebre

Francolí

Ter

Fluvià

Llobregat

sample sites

Unit variance scaled concentrations boxplot

Fish

4

Values

2

0

1

2

3

4

5

6

7

8

9

10

11

Sediment

4

Values

2

0

1

2

3

4

5

6

7

8

9

10

11

6

Water

4

Values

2

0

1

2

3

4

5

6

7

8

9

10

11

As Ba Cd Co Cu Cr Fe Mn Ni Pb Zn

Fish

Fish

Sediment

Water

Sediment

Water

compartments

sites

AUGMENTATION direction

column row tube

s1 40.2619 43.2553 41.3302

s2 16.7504 9.2823 19.4850

s3 9.4963 8.5312 14.3739

contaminants

Fish

Sediment

Water

SVD odf augmented data matrices in the three-directions

45

40

svd column-wise (variables)

svd row-wise(samples)

35

svd trube-wise (type)

30

2nd component

25

THREE-WAY DATA ARRAY MATRICIZING

or MATRIX AUGMENTATION

20

15

10

How many components

are needed to explain

each mode?

5

0

0

1

2

3

4

5

6

7

8

9

10

compartments

sites

metals

Bilinear modelling of three-way data

(Matrix Augmentation or matricizing, stretching, unfolding )

MA-PCA

MA-MCR-ALS

contaminants

Y

sites

4

F

1

F

Loadings

S

W

5

S

2

sites

sites

6

W

3

Daug

Xaug

Augmented

scores matrix

Augmented

data matrix

Explained variances using bilinear models

(profiles in two modes)

As

As

Ba

Ba

Cd

Cd

Co

Co

Cu

Cu

Cr

Cr

Fe

Fe

Mn

Mn

Ni

Ni

Pb

Pb

Zn

Zn

metals

metals

MA-PCA of scaled data without scores refolding

10

8

6

4

2

0

0

5

10

15

20

25

30

35

40

45

50

5

water

samples

0

sediment and fish

samples

Ba

As

Cu

Zn

-5

0

5

10

15

20

25

30

35

40

45

50

water

soluble

metal ions

MA-PCA

As

As

Ba

Ba

Cd

Cd

Co

Co

Cu

Cu

Cr

Cr

Fe

Fe

Mn

Mn

Ni

Ni

Pb

Pb

Zn

Zn

metals

metals

MA-MCR-ALS of scaled data with nn and without scores refolding

10

sediment and fish

samples

Ba

8

Zn

Cu

6

As

4

2

0

0

5

10

15

20

25

30

35

40

45

50

10

8

water

samples

6

4

2

0

0

5

10

15

20

25

30

35

40

45

50

More easily

Interpretable!!!

MA-MCR-ALS

MA-PCA

Calculation of the boundaries of feasible band solutions

(Journal of Chemometrics, 2001, 15, 627-646)

max

min

Nearly no rotation ambiguities are present in non-negative environmental profiles calculated by MCR-ALS

(very different to spectroscopy!!!!!)

Bilinear modelling of three-way data

(Matrix Augmentation or matricizing, stretching, unfolding )

Xaug

contaminants

Y

sites

F

1

4

F

S

PCA

MCR-ALS

W

5

S

2

sites

contaminants

X

Y

sites

6

W

3

sites

xi

xii

Z

zi

zii

D

compartments (F,S,W)

zi

compartments

SVD

sites

1

2

3

xi

zii

contaminants

SVD

4

5

6

xii

Scores

refolding

strategy!!!

(applied only

to final

augmented

Scores)

Loadings

recalculation

in two modes

from augmented

scores

Explained variances using trlinear models

(profiles in three modes)

0.5

0.4

0.3

0.2

0.1

0

As

Ba

Cd

Co

Cu

Cr

Fe

Mn

Ni

Pb

Zn

metals

0.5

0

-0.5

As

Ba

Cd

Co

Cu

Cr

Fe

Mn

Ni

Pb

Zn

metals

MA-PCA of scaled data with nn and scores refolding

Little differences in

samples mode!!!

MA-PCA + refolding

MA-PCA

MA-MCR-ALS of scaled data with scores refolding

MA-MCR-ALS + refolding

MA-MCR-ALS

Z

compartments

(F,S,W)

metals

F

metals

compartments (F,S,W)

Y

S

W

PARAFAC

sites

sites

D

X

compartments

sites

contaminants

Trilinear modelling of three-way data

PARAFAC of scaled data

PARAFAC

MA-PCA (bilinear)

MA-MCR-ALS

Trilinear constraint

compartments

sites

contaminants

Xaug

contaminants

Y

sites

F

1

contaminants

F

X

Y

S

W

S

MCR-ALS

2

sites

sites

Z

compartments (F,S,W)

sites

W

3

D

Substitution of

species profile

Selection of species profile

TRILINEARITY CONSTRAINT

(ALS iteration step)

1

1’

This constraint

is applied at each step

of the ALS optimization

and independently

for each component

individually

Rebuilding augmented scores

SVD

Folding

2

2’

Loadings

recalculation

in two modes

from augmented

scores

every augmented

scored wnated to

follow the trilinear

model is refolded

3

3’

10

8

6

4

2

0

0

5

10

15

20

25

30

35

40

45

50

10

8

As

As

Ba

Ba

Cd

Cd

Co

Co

Cu

Cu

Cr

Cr

Fe

Fe

Mn

Mn

Ni

Ni

Pb

Pb

Zn

Zn

metals

6

4

2

0

0

5

10

15

20

25

30

35

40

45

50

MA-MCR-ALS of scaled data with nn, trilinearity (without scores refolding)

MA-MCR-ALS nn + trilinear

MA-MCR-ALS nn

Calculation of the boundaries of feasible band solutions

(Journal of Chemometrics, 2001, 15, 627-646)

No rotation ambiguities are present in trilinear non-negative environmental profiles calculated by MCR-ALS

(very different to spectroscopy!!!!!)

MA-MCR-ALS of scaled data with nn, trilinearity and with scores refolding

MA-MCR-ALS nn + trilinear

PARAFAC nn

Comparison PARAFAC vs MCR-ALS (trilinearity)

Z

compartments

(F,S,W)

F

metals

compartments (F,S,W)

metals

S

2

Y

2

1

W

TUCKER3

2

=

sites

1

D

sites

2

G

Model (1,2,2)

X

compartments

sites

metals

Tucker3 modelling of three-way data

Tucker Models with non-negativity

constraints

[2 3 3]

[3 3 3]

[1 3 3]

[3 2 3]

[2 2 2] [2 2 3]

[1 2 2] [1 2 3]

parsimonious model

[1 2 2]

Tucker3 of scaled data

0.4

1

1

0.2

0.5

0.5

0

0

0

0

5

10

15

1

2

3

4

5

6

7

8

9

10

11

1

2

3

1

1

0.5

0.5

0

0

1

2

3

4

5

6

7

8

9

10

11

1

2

3

TUCKER3

PARAFAC

model [1 2 2]

model [2 2 2]

compartments

sites

contaminants

MA-MCR-ALS

Tucker3 constraint

Xaug

metals

Y

sites

F

1

4

F

X

S

Y

W

S

MCR-ALS

=

2

5

sites

Z

compartments (F,S,W)

sites

W

3

6

Loadings

recalculation

in two modes

from augmented

scores

D

Tucker3 CONSTRAINT

(ALS iteration step)

1’

4’

Folding

SVD

=

=

1

2

3

4

5

6

2’

5’

This constraint is applied at each step of the ALS optimization

and independently and individually for each component i

interacting augmented

scores are folded

together

3’

6’

MA-MCR-ALS of scaled data with nn, tucker3 (without scores refolding)

10

8

6

4

2

0

0

5

10

15

20

25

30

35

40

45

50

10

8

6

4

2

0

0

5

10

15

20

25

30

35

40

45

50

model [1 2 2]

model [2 2 2]

MA-MCR-ALS nn + Tucker3

MA-MCR-ALS nn + PARAFAC

MA-MCR-ALS of scaled data with nn, tucker3 and with scores refolding

MA-MCR-ALS nn + Tucker3

Tucker3

model [1 2 2]

model [1 2 2]

Summary of Results

INTEGRATION OF CHEMOMETRICS-GEOSTATISTICS (Geographical Information

Systems, GIS)

(67.3%)

(13.2%)

INTEGRATION OF CHEMOMETRICS-GEOSTATISTICS (Geographical Information

Systems, GIS)

(67.3%)

(13.2%)

INTEGRATION OF CHEMOMETRICS-GEOSTATISTICS (Geographical Information

Systems, GIS)

(67.3%)

(13.2%)

- Outline:
- Introduction and motivations of this work
- Environmental data tables
- Chemometrics bilinear and trilinear models and methods
- Example of application: metal contamination sources in fish, sediment and river surface water samples.
- Conclusions

Conclusions

Chemometric methods allow resolution of environemtal sources of chemical contaminants

However we should we aware of how every method displays the information because the mathematical properties of the used method are different (i.e. orthogonality vs non-negativity, bilinearity vs trilinearity, nr. of components...)

This interpretation and resolution of environmental sources is not easy because the contamination sources in real world are correlated and because of experimental data limitations (environmental sources should show variation in the investigated data set).

Bilinear PCA and MCR-ALS can be used to study multiway data sets and compared with multiway methods (like PARAFAC and Tucker if appropriate scores refolding is performed)

Bilinear non-negative MCR-ALS solutions may provide good approximation of the real sources because non-negative environmental profiles have little rotation ambiguity

Conclusions

PARAFAC and Tucker3 may provide simpler models and they are special useful for trilinear data or when not the same number of components are present in the different modes.

Intermediate situations between pure bilinear and pure trilinear models can be easily implemented in MCR-ALS

Bilinear based models are more flexible than trilinear based models to resolve ‘true’ sources of data variation

Different number of components and interactions between components in different modes (constraint under development) can be considered in mixed bilinear-trilinear-Tucker MA-MCR models

For an optimal RESOLUTION, the model should be in accordance with the 'true' data structure

Integration of Chemometrics-GIS results may facilitate geographical and temporal interpretation of contamination sources and they correlation with land uses, population and industrial activities

- Water Catalan Agency is acknowledge for its financial support and for providing experimental data sets
- Research grant Project MCYT, Nr. BQU2003-00191, Spain