Multivariate Resolution in Chemistry. Lecture 2. Roma Tauler IIQAB-CSIC, Spain e-mail: rtaqam@iiqab.csic.es. Lecture 2. Resolution of two-way data. Resolution conditions. Selective and pure variables. Local rank Natural constraints.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Multivariate Resolution in Chemistry Lecture 2 Roma Tauler IIQAB-CSIC, Spain e-mail: rtaqam@iiqab.csic.es
Lecture 2 • Resolution of two-way data. • Resolution conditions. • Selective and pure variables. • Local rank • Natural constraints. • Non-iterative and iterative resolution methods and algorithms. • Multivariate Curve Resolution using Alternating Least Squares, MCR-ALS. • Examples of application.
Multivariate (Soft) Self Modeling Curve Resolution (definition) • Group of techniques which intend the recovery of the response profiles (spectra, pH profiles, time profiles, elution profiles,....) of more than one component in an unresolved and unknown mixture obtained from chemical processes and systems when no (little) prior information is available about the nature and/or composition of these mixtures.
1 1.5 J J 0.8 1 ST 0.6 C ST 0.4 0.5 C E + I I 0.2 0 0 0 10 20 30 40 0 20 40 60 80 100 1.5 J D 1 D 0.5 I Bilinearity! 0 0 10 20 30 40 50 60 70 80 90 Chemical reaction systems monitored using spectroscopic measurements
2 -5 4 x 10 x 10 3.5 C 1.5 3 ST NC 2.5 1 ST 2 1.5 0.5 1 C E NR + 0.5 0 0 20 40 60 NR 0 100 0 20 40 60 80 LC-DAD coelution 1.2 NC 1 D 0.8 0.6 NR 0.4 D 0.2 0 Bilinearity! -0.2 0 10 20 30 40 50 60 Analytical characterization of complex environmental, industrial and food mixtures using hyphenated methods (chromatography or continuous flow methods with spectroscopic detection).
D ST 1.4 0.9 NC 1.2 0.8 protein 0.7 1 D1 ST 0.6 Absorbance (a.u.) 0.8 Absorbance D2 0.5 0.6 0.4 P1 CD2O and Cprotein C E NR + 1 0.3 0.4 0.2 NR 0.8 0.2 D2O P1 P2 0.1 0.6 0 0 1900 1800 1700 1600 1500 1400 1900 1800 1700 1600 1500 1400 Concentration (a.u.) D1 63.9 ºC Wavenumber (cm-1) Wavenumber (cm-1) 0.4 43.8 ºC NC 0.2 D2 0 P2 20 30 40 50 60 70 80 NR Temperature (ºC) D Bilinearity! Protein folding and dynamic protein-nucleic acid interaction processes.
Environmental source resolution and apportioment ST C E + NR NR source composition source distribution NC 22 samples D NR Bilinearity! concn. of 96 organic compounds
Soft-modelling MCR bilinear model for two way data: J dij I D dijis the data measurement (response) of variable j in sample i n=1,...,N are the number of components (species, sources...) cin is the concentration of component n in sample i; snjis the response of component n at variable j
Lecture 2 • Resolution of two-way data. • Resolution conditions. • Selective and pure variables. • Local rank • Natural constraints. • Non-iterative and iterative resolution methods and algorithms. • Multivariate Curve Resolution using Alternating Least Squares, MCR-ALS. • Examples of application.
Resolution conditions to reduce MCR rotation ambiguities (unique solutions?) • Selective variables for every component • Local rank conditions (Resolution Theorems) • Natural Constraints • non-negativity • unimodality • closure (mass-balance) • Multiway Data (i.e. trilinear data...) • Hard-modelling constraints • mass-action law • rate law • .... • Shape constraints (gaussian, lorentzian, assimetric peak shape, log peak shape, ...) • ....
Unique resolution conditions First possibility: using selective/pure variables 2 wavelength selective Ranges, where only one component absorbs elution profiles can be estimated without ambiguities 1 elution time selective ranges, where only one component is present spectra can be estimated without ambiguities 2 1
Detection of ‘purest’ (more selective) variables Methods focused on finding the most representative (purest) rows (or columns) in a data matrix. Based on PCA • Key Set Factor Analysis (KSFA) Based on the use of real variables • Simple-to-use Interactive Self-modelling analysis (SIMPLISMA) • Orthogonal Projection Approach (OPA)
How to detect purest/selective variables? • Selective variables are the more pure/representative/ dissimilar/orthogonal (linearly independent) variables..! • Examples of proposed methods for detection of selective variables: • Key set variables KSFA E.D.Malinowski, Anal.Chim Acta, 134 (1982) 129; IKSFA, Chemolab, 6 (1989) 21 • SIMPLISMA: W.Windig & J.Guilmet, Anal. Chem., 63 (1991) 1425-1432) • Orthogonal Projection Analysis OPA: F.Cuesta-Sanchez et al., Anal. Chem. 68 (1996) 79) • .......
Most dissimilar signal variables (approximate concentration profiles) Most dissimilar process variables (approximate signal profiles) Process variables Signal variables SIMPLISMA • Finds the purest process or signal variables in a data set.
i Retention times Signal variables SIMPLISMA HPLC-DAD Purest retention times • Variable purity si Std. deviation mi Mean Noisy variables si mi pi
i Retention times Signal variables SIMPLISMA HPLC-DAD Purest retention times • Variable purity si Std. deviation mi Mean f % noise (offset) Noisy variables pi
1 YiT Retention times i Signal variables SIMPLISMA Working procedure • Selection of first pure variable. max(pi) • Normalisation of spectra. • Selection of second pure variable. Calculation of weights (wi) Recalculation of purity (p’i) p’i = wi pi Next purest variable. max(p’i)
SIMPLISMA Working procedure • Selection of third pure variable. Calculation of weights (wi) 1 Retention times Recalculation of purity (p’’i) p’’i = wi pi Next purest variable. max(p’’i) 2 i Signal variables . . . YiT
SIMPLISMA Graphical information • Purity spectrum. Plot of pi vs. variables. • Std. deviation spectrum. Plot of ‘purity corrected’ std. dev. (csi) vs. variables csi = wi si
1.4 1.2 4000 1 2000 Absorbance 0.8 0 0.6 0 10 20 30 40 50 60 1 0.4 0.5 0.2 0 0 10 20 30 40 50 60 0 0 10 20 30 40 50 60 Retention times SIMPLISMA Graphical information Mean spectrum 10000 Concentration profiles 5000 0 0 10 20 30 40 50 60 Std. deviation spectrum 1st pure spectrum 31 if 1st variable is too noisy f is too low and should be increased
2nd pure spectrum 0.2 1.4 0.15 40 1.2 0.1 0.05 1 0 Absorbance 0 10 20 30 40 50 60 0.8 2nd std. dev. spectrum 0.6 1500 0.4 1000 0.2 500 0 0 10 20 30 40 50 60 0 Retention times 0 10 20 30 40 50 60 SIMPLISMA Graphical information Concentration profiles 31
3rd pure spectrum 0.06 23 1.4 0.04 1.2 0.02 1 0 Absorbance -0.02 0.8 0 10 20 30 40 50 60 3rd std. dev. spectrum 0.6 150 0.4 100 0.2 50 0 0 0 10 20 30 40 50 60 -50 Retention times 0 10 20 30 40 50 60 SIMPLISMA Graphical information Concentration profiles 40 31
4th pure spectrum -3 x 10 3 1.4 2 1.2 13 1 1 0 Absorbance -1 0.8 0 10 20 30 40 50 60 4th std. dev. spectrum 0.6 8 6 0.4 4 0.2 2 0 0 0 10 20 30 40 50 60 -2 0 10 20 30 40 50 60 Retention times SIMPLISMA Graphical information Concentration profiles 40 23 31
5th pure spectrum -18 x 10 2 1.4 1 40 1.2 0 1 -1 0 10 20 30 40 50 60 13 Absorbance 0.8 5th std. dev. spectrum 23 -14 x 10 31 1 0.6 0.4 0 0.2 -1 0 0 10 20 30 40 50 60 0 10 20 30 40 50 60 Retention times SIMPLISMA Graphical information Concentration profiles Noisy pattern in both spectra No more significant contributions
SIMPLISMA Information • Purest variables in the two modes. • Purest signal and concentration profiles. • Number of compounds.
Unique resolution conditions • Many chemical mixture systems (evolving or not) do not have selective variables for all the components of the system • When selected variables are not (totally) selective, their detection is still very useful as an initial description of the system reducing its complexity and because they provide good initial estimations of species profiles useful for most of the resolution methods
Lecture 2 • Resolution of two-way data. • Resolution conditions. • Selective and pure variables. • Local rank • Natural constraints. • Non-iterative and iterative resolution methods and algorithms. • Multivariate Curve Resolution using Alternating Least Squares, MCR-ALS. • Examples of application.
Unique resolution conditions Second possibility: using local rank information What is local rank? Local rank is the rank of reduced data regions in any of the two orders of the original data matrix It can be obtained by Evolving Factor Analysis derived methods (EFA, FSMW-EFA, ...) Conditions for unique solutions (unique resolution, uniqueness) based using local rank information have been described as: Resolution Theorems Rolf Manne, On the resolution problem in hyphenated chromatography. Chemometrics and Intelligent Laboratory Systems, 1995, 27, 89-94
Resolution Theorems Theorem 1: If all interfering compounds that appear inside the concentration window of a given analyte also appear outside this window, it is possible to calculate without ambiguities the concentration profile of the analyte V matrix defines the vector subspace where the analyte is not present and all the interferents are present. V matrix can be found by PCA (loadings) of the submatrix where the analyte is not present!
-5 x 10 1 analyte 0.9 0.8 0.7 0.6 0.5 interference 0.4 0.3 0.2 0.1 0 0 10 20 30 40 50 60 Resolution Theorems interference 1111111222222222111222222211111111 1111111 ------------ 111---------- 11111111 This local rank information can be obtained from submatrix analysis (EFA, EFF) Matrix VT may be obtained from PCA of the regions where the analyte is not present This is a rank one matrix! concentration profile of analyte ca may be resolved from D and VT
-5 x 10 1 analyte 0.9 interference 1 0.8 0.7 0.6 0.5 interference 2 0.4 0.3 0.2 0.1 0 0 10 20 30 40 50 60 Resolution Theorems Theorem 2: If for every interference the concentration window of the analyte has a subwindow where the interference is absent, then it is possible to calculate the spectrum of the analyte region where interference 2 is not present region where interference 1 is not present Local rank information
-5 x 10 1.5 -5 1 x 10 2 1.8 1.6 0.5 1.4 1.2 1 0.8 0 0 10 20 30 40 50 60 0.6 0.4 0.2 this system can be totally resolved using local rank information!!! this system cannot be totally resolved (only partially) based only in local rank information 0 0 10 20 30 40 50 60 Resolution Theorems Theorem 3. For a resolution based only upon rank information in the chromatographic direction the conditions of Theorems 1 and 2 are not only sufficient but also necessary conditions Resolution based on local rank conditions
-5 x 10 2 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 0 10 20 30 40 50 60 Unique resolution conditions? In the case of embedded peaks, resolution conditions based on local rank are not fulfilled! resolution without ambiguities will be difficult when a single matrix is analyzed
Conclusions about uniqueresolution conditions based on local rank analysis • In order to have a correct resolution of the system and to apply resolution theorems it is very important to have: • an accurate detection of local rank information EFA based methods • This local rank information can be introduced in the resolution process using either: • non-iterative direct resolution methods • iterative optimization methods
Resolution Theorems • Resolution theorems can be used in the two matrix directions (modes/orders), in the chromatographic and in the spectral direction. • Resolution theorems can be easily extended to multiway data and augmented data matrices (unfolded, matricized three-way data) Lecture 3 • Many resolution methods are implicitly based on these resolution theorems
Lecture 2 • Resolution of two-way data. • Resolution conditions. • Selective and pure variables • Local rank • Natural constraints. • Non-iterative and iterative resolution methods and algorithms. • Multivariate Curve Resolution using Alternating Least Squares, MCR-ALS. • Examples of application.
Unique resolution conditions Third possibility: using natural constraints • Natural constraints are previously known conditions that the profile solutions should have. We know that certain solutions are not correct! • Even when non selective variables nor local rank resolutions conditions are present, natural constraints can be applied. They reduce significantly the number of possible solutions (rotation ambiguity) • However, natural constraints alone, do not produce unique solutions in general
Natural constraints • Non negativity: • species profiles in one or two orders are not negative (concentration and spectra profiles) • Unimodality: • some species profiles have only one maximum (i.e. concentration profiles) • Closure • the sum of species concentration is a known constant value (i.e. in reaction based systems = mass balance equation)
C* Cc 0.3 0.35 Constrained profile(s) update plain LS profile(s). 0.25 0.3 0.2 0.25 0.15 0.2 0.1 0.15 0.05 0.1 0 0.05 -0.05 -0.1 0 0 10 20 30 40 50 0 10 20 30 40 50 Retention times Retention times Non-negativity
C* Cc 0.35 0.35 0.3 0.3 0.25 0.25 0.2 0.2 0.15 0.15 0.1 0.1 0.05 0.05 0 0 5 10 15 20 25 30 35 40 45 50 0 Retention times 0 5 10 15 20 25 30 35 40 45 50 Retention times Unimodality
= ctotal C* Mass balance Cc 0.35 0.35 0.3 0.3 0.25 ctotal 0.25 ctotal 0.2 0.2 0.15 0.15 0.1 0.1 0.05 0.05 0 0 2 3 4 5 6 7 8 9 2 3 4 5 6 7 8 9 pH pH Closure
Physicochemical model C* Cc 1 1 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 2 3 4 5 6 7 8 9 2 3 4 5 6 7 8 9 pH pH Hard-modelling
Unique resolution conditions Forth possibility: by multiway, multiset data analysis and matrix augmentation strategies (Lecture 3) • A set of correlated data matrices of the same system obtained under different conditions are simultaneously analyzed (Matrix Augmentation) • Factor Analysis ambiguities can be solved more easily for three-way data, specially for trilinearthree-way data
Lecture 2 • Resolution of two-way data. • Resolution conditions. • Selective and pure variables • Local rank • Natural constraints. • Non-iterative and iterative resolution methods and algorithms. • Multivariate Curve Resolution using Alternating Least Squares, MCR-ALS. • Examples of application.
Multivariate Curve Resolution (MCR) methods • Non-iterative resolution methods • Rank Annihilation Evolving Factor Analysis (RAEFA) • Window Factor Analysis (WFA) • Heuristic Evolving Latent Projections (HELP) • Subwindow Factor Analysis (SFA) • Gentle • ..... • Iterative resolution methods • Iterative Factor Factor Analysis (ITF) • Positive Matrix Factorization (PMF) • Alternating Least Squares (ALS) • …….
Non-iterative resolution methods are mostly based on detection and use of local rank information • Rank Annihilation by Evolving Factor Analysis (RAEFA, H.Gampp et al. Anal.Chim.Acta 193 (1987) 287) • Non-iterative EFA (M.Maeder, Anal.Chem. 59 (1987) 527) • Window Factor Analysis (WFA, E.R.Malinowski, J.Chemomet., 6 (1992) 29) • Heuristic Evolving Latent Projections (HELP, O.M.Kvalheim et al., Anal.Chem. 64 (1992) 936)
WFA method description E.R.Malinowski, J.Chemomet., 6 (1992) 29) D = C ST = cisTi i=1,...,n 1. Evaluate the window where the analyte n is present (EFA, EFF..) 2. Create submatrix Do deleting the window of the analyte n 3. Apply PCA to Do = Uo VTo = uojvToj j=1,...,m, m==n-1 4. Spectra of the interferents are: si = ij vToj j=1,...m 5. Spectra of the analyte lie in the orthogonal subspace of VTo 6. Concentration of the analyte cn can be calculated from: Dnis a rank one matrix sno is part of the spectrum of the analyte sn which is orthogonal to the interference spectra cn and sno can be obtained directly!! Like 1st Resolution Theorem!!!