Matrix Recovery Procedure for Generating Distances and Similarities

Introduction • Given a Matrix of distances D, (which contains zeros in the main diagonal and is squared and symmetric), find variables which could be able, approximately, to generate, these distances. • The matrix can also be a similarities matrix, squared and symmetric but with ones in the main diagonal and values between zero and one elsewhere. • Broadly: Distance (0 d 1) =1- similarity

Principal Coordinates (Metric Multidimensional Scaling) • Given the D matrix of distances, Can we find a set of variables able to generate it ? • Can we find a data matrix X able to generate D?

Main idea of the procedure: (1) To understand how to obtain D when X is known and given, (2) Then work backwards to build the matrix X given D

Procedure Remember that given a data matrix we have a zero mean data matrix by the transformation: With this matrix we can compute two squared and symmetric matrices The first is the covariance matrix S The second is the Q matrix of scalar products among observations

The matrix of products Q is closely related to the distance matrix , D, we are interested in. The relation between D and Q is as follows: Elements of Q: Elements of D: Main result: Given the matrix Q we can obtain the matrix D

How to recover Q given D? Note that as we have zero mean variables the sum of any row in Q must be zero t =trace(Q)

1. Method to recover Q given D

2. Obtain X given Q Note that: We cannot find exactly X because there will be many solutions to this problem. IF Q=XX’ also Q=X A A-1 X’ for any orthogonal matrix A. Thus B=XA is also a solution The standard solution: Make the spectral decomposition of the matrix Q Q=ABA’ Where A and B contain the non zero eigenvectors and eigenvalues of the matrix and take as solution X=AB1/2

Conclusion • We say that D is compatible with an euclidean metric if Q obtained as Q=-(1/2)PDP is nonnegative (all eigenvalues non negative)

Summary of the procedure

Example 1.Cities

(Note that they add up to zero by rows and columns. The matrix has been divided by 10000)

Example 1 Eigenstructure of Q :

Final coordinates for the cities taking two dimensions:

Example 1. Plot

Similarities matrix

Example 2: similarity between products

Example 2

Relationship with PC • PC: eigenvalues and vectors of S • PCoordinates: eigenvalues and vectors of Q If the data are matric both are identical. P Coordinates generalizes PC for non exactly metric data

Biplots Representar conjuntamente los observaciones por las filas de V2 y Las variables mediante las coordenadas D2/2 A’2 Se denimina biplots porque se hace una aproximación de dos dimensiones a la matriz de datos

Biplot

Non metric MS

A common method • Idea: if we have a monotone relation between x and y it must be a linear exact relationship between the ranks of both variables • Ordered regression or assign ranks and make a regression between ranks iterating

Matrix Recovery Procedure for Generating Distances and Similarities

Matrix Recovery Procedure for Generating Distances and Similarities

Presentation Transcript

Introduction to introduction to introduction to … Optimization

INTRODUCTION/ INTRODUCTION

Introduction

INTRODUCTION

Introduction

Introduction