1 / 77

Thinking in n-spaces

Thinking in n-spaces. Section 4 of Linguistics in the context of the Cognitive and Computational Sciences. When you need to manipulate n different numbers, you're living in n-space. Vectors. A vector can be thought of in 3 ways:

cruz
Download Presentation

Thinking in n-spaces

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Thinking in n-spaces Section 4 of Linguistics in the context of the Cognitive and Computational Sciences

  2. When you need to manipulate n different numbers, you're living in n-space.

  3. Vectors • A vector can be thought of in 3 ways: • 1. It is set of numbers ordered in a particular way, which by convention are written inside parentheses and separated by commas. • (1,4,0) • (5) • (-.5, 0, .1) etc.

  4. The space a vector lives in ... • A vector with n components lives in a particularspace with lots of other vectors. A space has a size which we call its dimensionality. There is 1-space, which is a line, 2-space (a plane), 3-space (familiar space), 4-space, …n-space. • If you can imagine 3-space, then that’s good enough for n-space.

  5. 1-space -3 -2 -1 0 1 2 3

  6. 2-space (2,1) and (4,2)

  7. (1,2) + (3,1) = (4,3)

  8. 3 space: (30, 10, 60)

  9. When you need to manipulate n different numbers, you're living in n-space. • Things are more complex, and our usual intuitions must be modified for this new world. (Later we'll talk about reducing the dimensionality of a space while keeping its points: changing the basis of the space.)

  10. Vectors: 2nd view • Avector with n components (an n-vector) can be thought of as a particular point in a space of n-dimensions. To make it particularly visible, you can think of a line connecting the origin (the origin?) to that point. • In this sense, the point exists independently of the coordinate system. That’s hardly the case for the first way of thinking of a vector.

  11. Vectors: 3rd view • Thus we can think of a vector as a length and a direction. In that way we don’t have to think of the vector as actually starting from the origin and ending up at some point. The vector can be translated (=moved) anywhere we like. Writing a vector in coordinates doesn’t emphasize this, but it’s true anyway.

  12. Length of a vector • How long is a vector? This is linked to (and usually discussed after) the question of comparing two vectors; but let’s just point out for now that the most common and usual way of defining the length of a vector is by using the Pythagorean theorem...

  13. Length of a hypotenuse... Length = sqrt( x2 + y2) x y

  14. Length…. • You square each of the coordinates, and add up all those squares; • then take the square root of that sum.

  15. Negative coordinates • Remember, a coodinate can be negative, but that negativity contributes to length just as much as positive: (2, -5) is exactly as long as (2,5). Squaring takes care of that.

  16. N-dimensions... Length of a vector is the square root of the sum of all the squares of the coordinates. Sqroot ( Sum xi2 ) that is, Sqroot ( S xi2 ) that is , ( S xi2 )1/2

  17. Sometimes we care a lot about length; sometimes we don’t care at all about length If we don’t care about length, and only want to compare the direction that a set of vectors are pointing in, then we normalize the vectors. This means: We divide each of them by their length: Which means, we make them all land on a hypersphere of radius 1.0. Got that?

  18. Normalizing vectors. • Remember that dividing a vector by a number (that number is its length) means dividing each coordinate by that number. • So half of (12,6,4) is (6,3,2): it’s a vector in the same direction, just half as long.

  19. How close are two vectors? You can always tell how similar (=close) two vectors are. The most common way is by taking the inner product (a.k.a dot product): V . W = Sum ( Product of corresponding coordinates) = S v[i] * w[i] This is also equal to: the length of V * length of W * cosine of the angle between them.

  20. Distance between 2 vectors Or you can find the cosine of the angle between the two vectors: Inner product of A and B = Sum of products of each dimension = Length of A * Length of B * cosine of the angle between. So cos (a) = A . B / |A| |B|

  21. Remember: the cosine of an angle goes like this: cos (0) is 1.0, but then it gets smaller, to the point where if the angle is 90 degrees -- a right angle -- the cosine is 0.0. • Two vectors being at right angles is a very important condition: we say that they are orthogonal. They have nothing to do with each other.

  22. The length of a vector (bis) The length of a vector V is the square root of inner prodcut of V with itself: |V| = sqrt(V . V)

  23. Projection of 1 vector onto another • Projection of A onto B is (just) (A.B)/|B| -- which is just A.B if B is normalized. A B Projection of A onto B

  24. Distance between two vectors Another way to measure how similar or different two vectors A, B are is to measure the length of the line that connects them: that line can be thought of as A-B, so its coordinates are (a1-b1, a2-b2, ….), and its length must be Sqroot ( S (ai - bi)2 )

  25. Addition of 2 vectors... • Just add the corresponding components.

  26. 2 vectors in the same space... • …are of the same number of coordinates, and can be added or dot-product-ed. You can’t do either operation to pairs of vectors not living in the same space.

  27. Cross product of 2 vectors (tensor product) • You can take the cross product of two vectors not living in the same space. You can take a cross product of a vector in m-space and one in n-space. This makes a matrix (we’ll get to matrices next time) of size m x n. That’s what we did in setting up the weight space of a network.

  28. Changing the origin of the space • Sometimes we want to change what counts as the zero-point of the vector space, but in some sense keep the vectors fixed. • The most common reason to want to do this is to make the origin be in the middle of the data -- • That is, if we have a bunch of data points, we want to make the zero value on each dimension be the average of all of the values on that dimension -- so that the new average value really is zero….

  29. A set {xi}N = { (xi1, xi22, xi3, …)} • Take the average over j’s: • 1/N Sj xj1 -- that’s the first coordinate of the new origin. • In most linguistic cases (all?), the raw scores are semi-positive (non-negative), so the average values are positive. But when we shift the origin to the mean value, roughly half of the new coordinates are negative: that’s what we want.

  30. 1st link between neural nets and vectors…. Consider a linear associator -- N input nodes, and just one output unit O, let’s say for simplicity’s sake. The input to the N input nodes can be viewed as an N-vector (once we’ve numbered those nodes); call it Input. The activation coming into unit O is the inner (dot) product of two vectors...

  31. The input vector Input…and • a vector whose coordinates are the weights connecting each input unit to the output unit. • Yes! Typically we want those values to add up to 1.0: so the set of weights connecting to each output unit is a vector C, living in a space of N dimensions, the same space in which the input vectors live. • Each output unit can, and should, be visualized as a vector in the input space -- typically a vector of length 1 (normalized)….

  32. And the input to that output unit is the inner product of its vector, and the input vector. In short, if the two are close (similar), the output unit gets lots of activation; if they’re far apart, the output unit gets little.

  33. Again... • So think of each output unit as a reporting station telling us how much of the input vector is projected onto its vector in the input space.

  34. Some linguistic examples Words as vectors in 26-dimensional space: cab maps to (1,1,1,0,0,….). Madam maps to (2,0,0,1,…,2,…..0)with that middle 1at dimension 13. What do normal distance metrics tell us about this representation?

  35. They tell us something about similarity, abstracting away from linearity and distance between letters. We can identify a sequence abc regardless of where it appears in a word (though we’ll confuse it with the same string scattered throughout the word). Question: what’s the relative price of these two things: excluding all words that fail to have an a, a b, and a c; and… including extra words that have an a, a b, and a c, but not in that order. One answer: if we use only 1’s and 0’s for our counters, we can do this very fast on traditional bit-based computers.

  36. Computers can do bit-based arithmetic very fast.

  37. Morphology • Suppose we consider a simplified language, where there are a large number of stems, and to each stem we assign (observationally) a signature, which summarizes the information about what suffixes appear on that stem in the corpus. Using (non-vector) notation, we say that there are stems with the signature ed.ing, and others with s.’s. Using vectors, we describe those as (1,1,0,0) and (0,0,1,1) respectively, for a stem that appears one time with the relevant suffix….

  38. Suppose we have various stems with such signature vectors as: • jump (3,3,0,0) • kick (5,5,0,0) • boy (0,0,2,2) • girl (0,0,2,2) • Obviously these are very artificial; even given the artifice of the example, having the same number of each suffix isn’t necessary. We’ll change that later.

  39. (Of) How many dimensions is this space? • The data demand that they be analyzed in a 4-dimensional space; but the analysis will show us that only a 1-dimensional space is necessary. • Analysis can be identified with a reduction of the dimensionality of the data space. • That’s the most important point of this course. The most important idea!

  40. From 2 dimensions to 1 dimension Lots of stems with ‘s, but no ing (0,3), (0,5) ‘s x Lots of stems with ing but no’s (2,0), (6,0) x ing x x New axis

  41. Analysis…. • What we would like to have is a single dimension along which a positive integer would tell us how many ings were observed; a negative value would tell us how many ‘s were observed; and there would be no way to express any kind of arrangement of the sort we don’t find.

  42. New axis • New axis could be a line that runs through (0,0) but has a slope of -1: so the vectors (1,-1) and (-1,1) lie on it ‘s x x ing x x

  43. New axis • What is this new axis? It is the “noun-verb” (better, Category 1/Category 2) distinction. • Is it clear that we could do exactly the same thing with our original 4-dimensional data: • jump (3,3,0,0) becomes (3) • kick (5,5,0,0) becomes (5) • boy (0,0,2,2) becomes (-2) • girl (0,0,2,2) becomes (-2)

  44. Rotation of coordinates (I.e., of basis) • We just rotated the basis, and found that we then could dispense with the second dimension. It’s always easy to rotate a coordinate system. Consider the 2-dimensional case, in a plane.

  45. Two similar triangles; the sum of their hypotenuses = y; the sum of the two sides is sin a Rotation (x,y) y x cos a x Angle a New coordinate is (x’, y’)

  46. Rotation So the new x’ value is: • x’ = x cos a + y sin a. Doing much the same thing on the y-axis: • y’ = - x sin a + y cos a.

  47. Next time we’ll rewrite this: x’ = x cos a + y sin a. y’ = -x sin a + y cos a. x’ cos a sin a y’ -sin a cos a x = y

  48. Represent words as bigrams • How big is the space of bigrams? 272, since we virtually always care about marking ends of words. • Thus each word can be represented in the space of bigrams: this is a very sparse representation: the average number of non-zeros per word is very low.

  49. But if we want to measure similarity between two words, and if we want to do lots of measuring of pairs of words (remember, if there are 1,000 words, then there are a million pairs!), then it maybe best to set up a bigram representation for each word and do a vector comparison.

  50. In practical computational terms, when the vectors are very sparse, we don’t keep track of all the zeros -- just keep track of which dimensions have non-zero values, and what those values are.

More Related