Formal Computational Skills

Formal Computational Skills Matrices 1

Overview • Motivation: many mathematical uses eg • Writing networks operations • Solving linear equations • Calculating transformations • Changing coordinate systems • By the end you should: • Be able to add/subtract/multiply 2 matrices • Be able to add/subtract/multiply 2 vectors • Use matrix inverses to solve linear equations • Write network operations with matrices • Advanced topics - will also discuss: • Matrices as transformations • Eigenvectors and eigenvalues of a matrix

Today’s Topics • Matrix/Vector Basics • Matrix definitions (square matrix, identity etc) • Matrix addition/subtraction/multiplication • Matrix inverse • Vector definitions • Vectors as geometric objects • Tomorrow’s Topics • Uses of Matrices • Matrices as sets of linear equations • Networks as matrices • Solving sets of linear equations • (Briefly) Matrix operations as transformations • (Briefly) Eigenvectors and eigenvalues of a matrix

Matrices A matrix W is a 2d set of m x n numbers (often denoted by a capital (and sometimes bold) letter W has m rows and n columns and we say W has dimensions m x n eg is a 2 x 3 matrix Each number or element in W is usually represented by a small wij and indexed with the row and column it is in ie wij is in the i’th row and j’th column 2 eg p12 = 2

2 3 w21 = 2 w32 = 3 What is w32 + w21 ? Is it a) 4 b) 5 c) 6 Answer: b) 5 What are the dimensions of L? 3 x 4 or 4 x 3 … Answer: 4 x 3 Note in both (all) cases indexed as row then column

Square and Diagonal Matrices If m = n, say W is a square matrix eg A diagonal matrix D is a square matrix where all off-diagonal elements (ie elements where i is not equal to j) equal zero eg Mathematically we say: (where the upside-down A means “for all”)

Identity Matrix The identity matrix I is a diagonal matrix where all the diagonal elements are 1 It is ALWAYS represented by a capital I and it plays the role of the number 1 in multiplication and ‘division’ If you multiply a matrix by I you get that matrix: IA = AI = A just as if you multiply a number by 1: 1 x 1.6 = 1.6 x 1 = 1.6 MATHS FACT: all maths ‘objects’ obey the same rules. One is they have to have a 1 (known as identity) which a) leaves things unchanged when you multiply by it b) Is what you get when you divide something by itself

Matrix Transpose the transpose of a matrix P is written as PT and is P with the rows and columns swapped round eg so [PT]ij = [P]ji = pji Qu: if P has dimensions m x n what dimensions does PT have? Answer: n x m One useful equation: (AB)T = BTAT

Addition/Subtraction Adding/subtracting a scalar k (ie a constant): A = W + k = k+W means W with k added to each element ie aij = wij + k eg Matrix plus/minus a matrix: both must be of the same size and we add each element together (point-by-point or point-wise operation) ie if C = A + B then cij = aij + bij eg

Matrix-Constant Multiplication if A=Bk = kB, where k is a constant: multiply all elements in B by k ie aij= kbij

Formula: Matrix-Matrix Multiplication if A=BC, to get the value in the i’th row and j’th column in A, take the i’th row in B, multiply each element by the corresponding element in the j’th column in C, and add them together n columns m rows n rows p columns so inner dimensions must match ie (m x n) (n x p) m x p

Matrix Inverses Given the complications of matrix multiplication, what about division?!?! Use a matrix inverse: Inverse of A is written A-1 Analogy to numbers – inverse of 2 is 2-1 = ½. Just as: 2 x 2-1 = 2-1x 2 = 1, A A-1 = A-1A = I (the identity matrix) This defines the inverse ie: if AB = I and BA = I then B = A-1

It is used for division since dividing by 2 is like multiplying by 2-1 ie If : y = 2z, then: 2-1y = 2-12z = 1z = z . Similarly if: Y = WX then: W-1Y = W-1WX = IX = X more later ... Calculating A-1 is a pain: use a computer. Note however, that to do it, need to know the determinant of A which is written as |A|: analogous to ‘size’ of a matrix Now try questions 1-4 off the work sheet.

Powers and Things A2 = AA but how about A1/2 ?? There are ways but we shouldn’t worry apart from special case of diagonal matrices which are easy Finally, often said matrix multiplication is associative but not commutative. What does this mean? A(BC) = (AB)C (associative) but, in general: (not commutative)

Vectors A vector is a special matrix where m or n is 1. Usually denoted by a small letter, sometimes bold v , sometimes underlined v, sometimes bold and underlined v and sometimes with an arrow over it: and sometimes just with a square-ish font. I’m going to (pretty much) stick with an underlined letter Row vector if m is 1: v = (1, 2, 3) Column vector if n is 1: Strictly (and I’m pretty strict)all vectors are column vectors so unless told otherwise assume a column vector so if you see vT then it is a row vector

A vector with n elementsv = (v1, v2, …, vn)T is said to be n-dimensional and is a point or position ie an object in nD space. 6 x v1 = (1, 6)T 2 x v2 = (3, 2)T 1 3 Here have 2D vectors and elements specify the coordinates By convention, in 2D 1st element refers to the horizontal (x-) axis and 2nd to the vertical (y-) axis Often visualised as a line/arrow joining the origin and that point Vectors have both direction and length and so often used for eg speed (think of arrows representing wind on weather maps)

v2 Vector Addition/Subtraction As they are matrices, vectors follow all rules of matrices so are eg added and subtracted in a point-wise way v3 = v1 + v2 = (1, 6)T +(3, 2)T = (1+3, 6+2)T = (4, 8)T v3 = v1 + v2 However it can also be viewed geometrically: = (4, 8)T 6 If you were at v1 v1 2 then added v2 on to the end v2 1 3 you would be at v3

Geometrically, subtracting a vector is the equivalent of going backwards along the arrow so: v3 – v1 = v2 v3 6 - v1 v1 2 v3 - v1= v2 1 3 = – v1 + v3 u Can also use this to get the vector between 2 vectors eg what is vector u from v1 to v3? Get to v3 from v1 by going backwards along v1 (ie - v1) Then forwards along v3 (+v3)

Organisms (ants, us etc) use this for path integration to get back home: Imagine a path of (random) vectors out to some goal At any point in the path, the sum of the vectors points to current position so minus the sum points back to the start Rather than have to ‘remember’ all the vectors of the path, just keep adding the last bit walked to a ‘home’ vector. Then return home directly by following minus the home vector

Length of Vectors Length of v written |v| (or ||v||) From Pythagoras: 4 v = (3, 4)T 3 Similarly if: and in general: As the vector between u and v is v - u: distance between 2 points is: Note if |v| = 1 v is known as a unit vector

Vector Multiplication Like matrices, but can be viewed geometrically 6 3v = (3, 6)T Eg multiplying by a constant k makes v k times longer 3v = 3(1, 2)T = (3, 6)T v 2 1 3 Standard vector vector multiplication: vectors must be the same length, 1st one a row vector, 2nd a column vector eg and in general: Vector vector multiplication also known as the dotproduct (u.v) and the innerproduct <u,v>. Result is a single number Note vTv = |v|2 since eg

v t u What does inner product mean?? uTv = |u| |v| cos(t) where t is the angle between the 2 vectors v So uTv = 0 if vectors are orthogonal (perpendicular t=90, cos(t)=0) u and maximised if vectors are parallel (t=0 so cos(t) = 1) v u v Also can be interpreted as the projection of one vector onto another (usually unit) vector (cf principal component analysis) t |v| cos(t) u, (|u|=1) Question: What is an angle in 10 dimensions? In n-dimensions use the dot product to define the angle between 2 vectors

Summary of Main Points • How to multiply 2 matrices • What a matrix inverse is • How adding vectors can be interpreted geometrically • How to calculate vector length |v| and distance between 2 vectors • vTv = |v|2 • uTv = |u||v|cos(angle between vectors) • To project u onto v if |v|=1 do uTv

Formal Computational Skills Matrices 2

Yesterday’s Topics • Matrix/Vector Basics • Matrix definitions (square matrix, identity etc) • Matrix addition/subtraction/multiplication • Matrix inverse • Vector definitions • Vectors as geometric objects • Today’s Topics • Uses of Matrices • Matrices as sets of equations • Networks written as matrices • Solving sets of linear equations • (Briefly) Matrix operations as transformations • (Briefly) Eigenvectors and eigenvalues of a matrix

Can write this as a matrix operation: Similarly, can write: w1x1 + w2x2 + w3x3 = y1 as Equations as Matrices 2x1 + 3x2 + 4x3 = y1 Suppose we have the following: Or, using vectors: wT=(w1, w2, w3) and x = get: wTx = y1 Bit more concise but not great

This becomes: Or: where y = Similarly: w11x1 + w12x2 + w13x3 = y1 w21x1 + w22x2 + w23x3 = y2 becomes: Sets of Equations as Matrices However, suppose we have several equations involving x eg 2x1 + 3x2 + 4x3 = y1 4x1 + 3x2 + 8x3 = y2 Or: Wx = y where:

S Matrices as Neural Networks Will encounter this notation when dealing with ArtificialNeural Networks (ANNs) Can think of networks as functions which transforms input vectors into output vectors (y) via a set (matrix) of numbers known as weights (W) associated with connections from inputs to outputs (x) x1 w1 x2 w2 y= w1x1 + w2x2 + w3x3 x3 w3 Comes from connectionist picture of the brain as electrical impulses travelling along axons, modulated by synaptic strengths and being summed at synapses

x1 w1 x2 w2 y= w1x1 + w2x2 + w3x3 S x3 w3 • Above ANN takes3D input vectorxT=(x1, x2, x3). • It therefore has 3 connections from output to input each with an associated weight. Thus W is a 3d vectorwT=(w1, w2, w3)). • Inputs travel along connections and are multiplied by weights and summed to give the (1D) output y (say y is a weighted sum of the inputs). Thus sum is the same as if we multiplied w and x so can write output as

S S If we have more than one output, need a matrix of weights W Effectively have one weight vector for each output w11 x1 y1 w12 x2 w21 w13 w22 y2 x3 w23 Represent all weights by matrix W where each weight vector is a row of W and the ij’th element of W is the weight wij

S S Each output is a weighted sum of input and corresponding weight vector ie a row of W multiplied by x w11 x1 y1 = w11x1 + w12x2 + w13x3 = x2 w12 w21 w13 x3 w22 y2 = w21x1 + w22x2 + w23x3 = w23 So for n-dimensional input data i’th output is Thus writing output as a vector y network operation is: Wx = y ie Note weights indexed (oddly) as wto from so that the matrix multiplication works

Finally, suppose we have many input vectors xi = (x1i, x2i , x3i)T Make a matrix X where the i’th column of X is the i’th input vector Each input generates a different output vector. Thus make a matrix Y where the i’th column is the output due to the application of the i’th input vector to the network Since we know that a single output y = Wx we have: Might not seem like a great leap but very convenient for mathematical manipulation (and matlab programming) eg …

Solving Linear Equations Suppose we have to solve: x1 + 2x2 = 0 3x1 + 7x2 = 1 After some (tedious) calculations solve to get: x1 = -2, x2 = 1. Instead write as: Wx = y where: and: then solve the equations by multiplying both sides by W-1 since: W-1 Wx = Ix = x = W-1y Giving: So if Wx = y solve via x = W-1y using computer .... But not always so simple …

Sometimes W-1 does not exist ie W is singular (or near-singular). Why? Problem could be underdetermined eg • (usually) no unique solution [eg x1=1, x2=6 or x1=2, x2=4 etc] Problem is that we need more data (number of unknowns = number of equations/bits of info needed). Same problem occurs if one equation is written twice (Row 2 = 2 x Row 1) Or one row is a linear combination (ie made up out of) of the others (Row 3 = Row 1+Row 2)

Or problem could be overdetermined eg more outputs than inputs: contradictory solutions (x=2 and x=1) For W-1 to exist, W must be square and its rows must not duplicate info ie they must be linearly independent This is often not the case so to avoid problems we use the pseudoinverse of W If the problem is underdetermined finds the solution with the smallest sum of squares of elements of x and if overdetermined, finds an approximate solution (What sort? could be investigated…) In networks, used to find weights for Radial Basis Function networks and single layer networks (eg Bishop p.92-95)

v2 x x x x v1 Matrix Vector Multiplication: Matrices as Transformations If the dimensionality is right as above, can view a matrix vector multiplication as a transformation of one vector into another If U is diagonal. Get expansion/contraction along the axes eg 2v2 x x x x x x 5v1 etc

Get a rotation anticlockwise thru t by: So we see that (1,0)T goes to (cos t, sin t)T and (0, 1)T goes to (-sin t, cos t)T (-sin t, cos t)T (0,1)T (cos t,sin t)T (1,0)T In general, transformations produced by matrix multiplication can be broken down into a rotation followed by an expansion-contraction followed by another rotation

Eigenvectors and Eigenvalues If : Ax = lx For some scalar l not = 0, then we say that x is an eigenvector of A with eigenvaluel. Clearly, x is not unique since if: Ax = lx, then: A2x = 2Ax = 2lx = l2x so it is usual to scale x so that it has length = 1. Turns out eigenvectors are VERY useful

Intuition: direction of x is unchanged by being transformed by A so it in some sense reflects the principal direction (or axis) of the transformation x x ie Repeatedly transform v by A. x x Start at v then AAv= A2v etc … then Av x Most starting points result in curved trajectories Starting from an eigenvector, x however, get: x Ax = lx, A2x = l2x, A3x = l3x, x A4x = l4x, … x So trajectory is a straight line x x Note if |l| > 1, xexpands. If not, will contract

Eigenvector Facts If the data is d-dimensional there will be d eigenvectors ie aij=aji If A is symmetric eg: true if A is the covariance matrix and many other important matrices, (unit length) eigenvectors will be orthogonal (say they are orthonormal): xiT xj = 1 if i = j xiT xj = 0 else This means that the eigenvectors form a set of basis vectors and any vector can be expressed as a linear sum of the eigenvectors ie they form a new rotated co-ordinate system

Summary of Main Points • When dealing with networks, Wx = y means “outputs y are weighted sum of input x. That is: y is the network output given input x” • Similarly, WX = Y means “each column of Y is the network output operating on the corresponding column of X” • Matrix inverse can be used to solve linear equations, but pseudoinverse is more robust • In networks, use pseudoinverse to calculate ‘best’ weights that will transform training input vectors into known target vectors • Matrix vector multiplication can be seen as a transformation (eg rotations and expansions) • If Ax = lxx is an eigenvector, with eigenvaluel • Eigenvectors and eigenvalues tell us about main axes and behaviour of matrix transformations

Formal Computational Skills