A Method for Comparing Self-Organizing Maps: Case Studies of Banking and Linguistic Data

A Method for Comparing Self-Organizing Maps: Case Studies of Banking and Linguistic Data Toomas Kirt, Ene Vainik, Leo Võhandu ADBIS 2007

Outline • Self-Organizing Map • Dimensionality Reduction • Matrix Reordering • Methodology of Similarity Measurement • Case Studiesof Banking and Linguistic Data ADBIS 2007

Self-Organizing Map (SOM) • The SOM is afeedforward artificial neural network that uses an unsupervised learning algorithm. • It projects nonlinear relationships between high-dimensional input data into a two-dimensional output grid (map). ADBIS 2007

Dimensionality Reduction • To reduce dimensionality of the data we use the principal component analysis (PCA). • The PCA transforms the data linearly and projects original data on a new set of variables that are called the principal components, while retaining as much as possible of the variation present in the data set. • Those are uncorrelated and ordered so that the first few components represent most of the variation of the original variables. ADBIS 2007

Matrix Reordering • The matrix reordering is a structuring method for graphs (and general data tables). • The method reorganizes the neighborhood graph data vertices according to specific property – systems monotonicity. • The main goal of such matrix analysis is to permutate rows and columns to maximize the similarity of the neighbouring elements. ADBIS 2007

Methodology of Similarity Measurement • When we use different sources of data that describe the same phenomenon but arecollected somehow differently or the number of variables is varying then we have toassess whether the results of the two analyses are similar. • As the SOM projects closeunits of the input space into nearby map units the local neighborhood should remainquite similar. • In this paper we propose a simple method to compare the results ofdifferent self-organizing mapsthat isbased on the measurement ofsimilarities of the local neighborhood. ADBIS 2007

Methodology of Similarity Measurement I • Firstly, to analyze general organization the resulting map is visually examined and clusters and their borders are identified, also the general orientation and locations of data items are identified. • Thereafter the matrix of neighbourhood relations is formed. ADBIS 2007

Methodology of Similarity Measurement II • The SMC rates positive and negative similarity equally and can be used if positive and negative values have equal weight. • Jaccard Coefficient (J) is used if the negative and positive matches have different weights and therefore ignores negative matches. • If the value of the Jaccard coefficient and SMC is below 0.5 then the number of positive matches is less than half of the total matches. ADBIS 2007

Methodology of Similarity Measurement III • Fourth part of the similarity measurement consists of finding how much the two neighbouring matrixes are identical what is a maximum isomorphic subset. • Here we can perform a new meta-level analysis and reorder and visualize the isomorphic sub-graph to see commonly shared information between two maps. ADBIS 2007

Case Studies • We use two sets of data to illustrate the method of similarity measurement. • The first is a research into the concepts of emotion in Estonian language. The survey consisted of two parts and as a result two different data matrixes describe the same set of emotion concepts. We analyze whether and to what extent the results of two tasks are comparable. • The second data set is banking data. In this case the purpose of our meta-analysis is to detect whether and to what degree the dimensionality reduction method (PCA) applied to the data has preserved its structure. ADBIS 2007

Study of Estonian Concepts of Emotion • The purpose of the study was to discover the hidden structure of the Estonian emotion concepts. • Two lexical tasks were carried out providing information about emotion concepts either through their relation to the episodes of emotional experience or through semantic interrelations of emotion terms (synonymy and antonymy). • There were 24 emotion concepts selected for the study. • In the first task they had to evaluate the meaning of every single word against a set of seven bipolar scales. • In the second task the same participants had to elicit emotion terms similar and opposite by meaning to the same 24 stimulus words. ADBIS 2007

Study of Estonian Concepts of Emotion • a) Results of the Task 1 (24 concepts evaluated on the seven bipolar scales), • b) Results of the Task 2 (24 concepts arranged according to relations of synonymy and antonymy) ADBIS 2007

Study of Estonian Concepts of Emotion • Neighbourhood similarity of the SOM of conceptual data ADBIS 2007

Study of Estonian Concepts of Emotion • Reordered and visualized isomorphic subgraph of lexical data (Task 1 vs. Task 2). Neighborhood range 2. ADBIS 2007

Study of Banking Data • The second data set is used to illustrate the impact of dimensionality reduction by PCA on the SOM maps. • The aim of the study is to measure the similarity between the results of SOM mapping of original data and reduced data. • The data set consists of banking data (1997—2000; http://www.bankofestonia.info). We have used 133 public quarterly reports by individual banks as a balance sheet and profit / loss statement (income statement). The 50 most important variables have been selected to form a short financial statement of a bank. ADBIS 2007

Study of Banking Data • a) SOM of 50 original variables, • b) SOM of 26 principal components describing 95% of variation, ADBIS 2007

Study of Banking Data • Neighbourhood similarity of the SOM of banking data ADBIS 2007

Study of Banking Data • Reordered and visualized isomorphic subgraph of banking data (Original vs. PCA 95% variation). Neighbourhood range 1. ADBIS 2007

Conclusions • The case studies give an overview of possibilities to measure similarity between self-organizing maps that is based on the topology and neighbourhood relations. • We find local neighbourhood relations between the data items and measure the similarity of relations by coefficients and by finding an isomorphic subgraph. • The maximum isomorphic subgraph as a measure to identify the similarity between the SOMs, but it became a new meta-level tool. ADBIS 2007

Thank you! Toomas.Kirt@mail.ee ADBIS 2007

A Method for Comparing Self-Organizing Maps: Case Studies of Banking and Linguistic Data