1 / 12

1. Introduction to multiway analysis

1. Introduction to multiway analysis. Quimiometria Teórica e Aplicada Instituto de Química - UNICAMP. Why build models of chemical data?. Data exploration e.g. find important sources of variation in complex environmental samples Compound identification and calibration in mixtures

ramona
Download Presentation

1. Introduction to multiway analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 1. Introduction to multiway analysis Quimiometria Teórica e Aplicada Instituto de Química - UNICAMP

  2. Why build models of chemical data? • Data exploration • e.g. find important sources of variation in complex environmental samples • Compound identification and calibration in mixtures • e.g. identification and quantification of pollutants in river water • Statistical process control • e.g. detect disturbances in product quality • Models are useful approximations of reality • first-principles models are based on chemical/physical knowledge– do they fit well with the measured data? • empirical models (e.g. PCA, PLS) are purely mathematical– do they have a chemical meaning?

  3. Multiway data • Multiway data is becoming more common in chemistry. Examples are • Chromatography • sample number  elution time  wavelength • On-line analysis • experiment number  time  wavelength/temperature/pressure • Tandem mass spectroscopy (MS-MS) • sample number  parent ion mass  daughter ion mass • Image analysis • experiment number  time x-position y-position

  4. Multiway data – an example • Batch process data: time time batch process variable process variable One batch A series of batches X (JK) X (IJK)

  5. Multiway modelling • The PARAFAC (or ‘CANDECOMP’) and Tucker models were developed by psychometricians 30 years ago, but are especially useful in chemistry, because chemical data often has a multilinear structure. • PARAFAC and Tucker are different generalizations of PCA for higher-order data. • There also exist generalizations of PLS for higher-order data, e.g. N-PLS.

  6. G S BT VT U A } These models give the same residuals, E (2) X = USVT + E SVD (3) X = AGBT + E TMCA Two-way modelling • Two-way data can be modelled using bilinear models: PT + = E X T time process variable (1) X = TPT + E PCA

  7. Multiway models - PARAFAC • Multiway data can be modelled using multilinear models, such as the PARAFAC model... CT + = BT E X batch A time process variable

  8. core array Multiway models - Tucker • ...or the Tucker model: CT + = G BT E X batch time A process variable

  9. Unfolding • Another option is to matricize (or ‘unfold’) the data and use standard two-way methods: X X1 ... XI I I K XIJK JK J • Can also unfold along other modes:XJKI and XKIJ • But if a multiway structure exists in the data, multiway methods have some important advantages!!

  10. Advantages of multiway • Multiway models need fewer model parameters to describe the data, e.g. a three-component model of X (30  800  200) uses • 540090 parameters for unfold-PCA • 3090 parameters PARAFAC • PARAFAC is more parsimonious than unfold-PCA.  • Multiway models use one set of loadings for each mode – results are much easier to plot and understand. 

  11. However, ALS algorithms are easy to understand and there is now some high-quality, free MATLAB code available on the internet: • The N-way Toolbox (Andersson & Bro, http://www.models.kvl.dk) Disadvantages of multiway • PARAFAC and Tucker models are usually calculated using a technique called ‘alternating least squares’ (ALS). • This is sometimes slow...  • ...and sometimes gives convergence problems if an inappropriate model is used.     

  12. Conclusions • PARAFAC and Tucker are both generalizations of the PCA model for multiway data. • PARAFAC and Tucker models use fewer parameters and are easier to interpret than unfold-PCA. • Models can be calculated in MATLAB using ‘N-way Toolbox’ (or ‘PLS_Toolbox’)

More Related