1 / 63

LING / C SC 439/539 Statistical Natural Language Processing

LING / C SC 439/539 Statistical Natural Language Processing. Numerical Python. Outline. Overview of NumPy Creating arrays Resizing arrays I ndexing and selection O perations on one array Operations on two arrays Linear algebra. Numerical Python. Module that can be imported in Python

prue
Download Presentation

LING / C SC 439/539 Statistical Natural Language Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LING / C SC 439/539Statistical Natural Language Processing Numerical Python

  2. Outline • Overview of NumPy • Creating arrays • Resizing arrays • Indexing and selection • Operations on one array • Operations on two arrays • Linear algebra

  3. Numerical Python • Module that can be imported in Python • Allows for: • Datatypes for vectors and matrices (called Arrays) • Vectorized computations, similar to MATLAB • Highly efficient; calls numerical libraries coded in C • Code looks much more like math • Fewer explicitly coded loops • Results in concise code

  4. Vectorized computing • Standard Python: L = [1,2,3,4,5] L2 = [] for i in range(len(L)): L2.append(L[i] * 3) • NumPy: L = array([1,2,3,4,5]) L2 = L * 3

  5. NumPy documentation • NumPyUser Guide • http://docs.scipy.org/doc/ • Guide to NumPy by Travis Oliphant (creator of NumPy) • http://www.tramy.us/guidetoscipy.html

  6. First, import NumPy >>> from numpy import *

  7. help(functionname) >>> help(eye) Help on function eye in module numpy.lib.twodim_base: eye(N, M=None, k=0, dtype=<type 'float'>) Return a 2-D array with ones on the diagonal and zeros elsewhere. Parameters ---------- N : int Number of rows in the output. M : int, optional Number of columns in the output. If None, defaults to `N`. k : int, optional Index of the diagonal: 0 refers to the main diagonal, a positive value refers to an upper diagonal, and a negative value to a lower diagonal. dtype : dtype, optional Data-type of the returned array.

  8. help(functionname) Returns ------- I : ndarray (N,M) An array where all elements are equal to zero, except for the `k`-th diagonal, whose values are equal to one. See Also -------- diag : Return a diagonal 2-D array using a 1-D array specified by the user. Examples -------- >>> np.eye(2, dtype=int) array([[1, 0], [0, 1]]) >>> np.eye(3, k=1) array([[ 0., 1., 0.], [ 0., 0., 1.], [ 0., 0., 0.]])

  9. Outline • Overview of NumPy • Creating arrays • Resizing arrays • Indexing and selection • Operations on one array • Operations on two arrays • Linear algebra

  10. Arrays in NumPy • All these are arrays in NumPy: • One-dimensional vector • Two-dimensional matrix • Higher-dimensional matrix

  11. Creating a vector (one-dimensional array) >>> v = array([1,2,3,4,5]) >>> v array([1, 2, 3, 4, 5]) >>> ndim(v) # number of dimensions 1 >>> shape(v) # 5 elements in first dim. (5,) >>> size(v) # total number of elements 5

  12. Creating a matrix (two-dimensional array) >>> a = array([[1,2,3],[4,5,6]]) >>> a array([[1, 2, 3], [4, 5, 6]]) >>> ndim(a) # number of dimensions 2 >>> a.shape # 2 rows, 3 columns (2, 3) >>> size(a) # total number of elements 6

  13. When coding comma-separated types in Python (e.g. arrays or lists), can press enter after a comma >>> # these all produce the same result: >>> a = array([[1,2,3],[4,5,6]]) >>> a = array([[1, 2, 3], [4, 5, 6]]) >>> a = array([[1, 2, 3], [4, 5, 6]])

  14. Calling functions vs. object attributes >>> a.shape (2, 3) >>> shape(a) (2, 3) • Produces same results whether you pass in object to function, or access the object’s attribute • The function accesses the object‘s attribute • Both can be used interchangeably • But in cases where a function is defined in another module, you’ll want to access the function through the object • You’ll see this later with max Also: a.ndima.size ndim(a) size(a)

  15. Special functions to create matrices(2-d arrays) >>> ones((2,3)) array([[ 1., 1., 1.], [ 1., 1., 1.]]) >>> zeros((2,3)) array([[ 0., 0., 0.], [ 0., 0., 0.]]) >>> eye(3) array([[ 1., 0., 0.], [ 0., 1., 0.], [ 0., 0., 1.]])

  16. Type of an array >>> a array([[1, 2, 3], [4, 5, 6]]) >>> a.dtype dtype('int32') >>> b = ones([2,3]) >>> b array([[ 1., 1., 1.], [ 1., 1., 1.]]) >>> b.dtype dtype('float64')

  17. linspace >>> # vector with linearly spaced values >>> # linspace(start, stop, num values) >>> # function determines spacing of vals for you >>> linspace(3, 16, 5) array([ 3. , 6.25, 9.5 , 12.75, 16. ]) >>> linspace(15, 19, 4) array([ 15., 16.33333333, 17.66666667, 19.])

  18. Arrays of random numbers >>> random.rand(2,3) # uniformly dist. between 0 and 1 array([[ 0.49386404, 0.12125634, 0.58045141], [ 0.80695113, 0.32188799, 0.63249074]]) >>> random.randn(2,3) # normal dist., mean=0, var=1 array([[-0.37422103, 1.03866716, -0.53547127], [ 0.30022273, 0.23015563, 0.80873554]])

  19. Arrays of random numbers >>> # 2 x 3 matrix, uniformly dist. between 5 and 7 >>> random.uniform(5, 7, (2, 3)) array([[ 6.50654571, 5.77650203, 6.68806597], [ 6.29241871, 6.45282975, 6.4707847 ]]) >>> # 4 x 3 matrix, rand. ints between 3 and 6 >>> random.randint(3, 6, (4, 3)) array([[3, 3, 3], [5, 3, 5], [4, 3, 3], [5, 3, 3]])

  20. Outline • Overview of NumPy • Creating arrays • Resizing arrays • Indexing and selection • Operations on one array • Operations on two arrays • Linear algebra

  21. Shape of an array >>> a.shape # or shape(a) (2, 3) >>> nrow = a.shape[0] >>> ncol = a.shape[1] >>> nrow 2 >>> ncol 3 >>> zeros(a.shape) # new array w/ same shape array([[ 0., 0., 0.], [ 0., 0., 0.]])

  22. Transpose >>> a array([[1, 2, 3], [4, 5, 6]]) >>> transpose(a) # or a.transpose() array([[1, 4], [2, 5], [3, 6]]) >>> a # didn’t change it array([[1, 2, 3], [4, 5, 6]]) >>> a = transpose(a) # need to assign to variable >>> a array([[1, 4], [2, 5], [3, 6]])

  23. Reshaping an array >>> a # 2 x 3 matrix array([[1, 2, 3], [4, 5, 6]]) >>> reshape(a, (3, 2)) # 3 x 2 matrix array([[1, 2], [3, 4], [5, 6]]) >>> reshape(a, (1,6)) # 1 x 6 matrix array([[1, 2, 3, 4, 5, 6]])

  24. Concatenation >>> a = array([[1,2,3],[4,5,6]]) >>> a array([[1, 2, 3], [4, 5, 6]]) >>> b = zeros(a.shape) >>> b array([[ 0., 0., 0.], [ 0., 0., 0.]])

  25. Concatenation >>> # note that it’s converted to float >>> concatenate((a,b), axis=0) array([[ 1., 2., 3.], [ 4., 5., 6.], [ 0., 0., 0.], [ 0., 0., 0.]]) >>> concatenate((a,b), axis=1) array([[ 1., 2., 3., 0., 0., 0.], [ 4., 5., 6., 0., 0., 0.]])

  26. Try to concatenate a matrix with a vector >>> a array([[1, 2, 3], [4, 5, 6]]) >>> c = arange(3) >>> c array([0, 1, 2]) >>> concatenate((a,c), axis=0) Traceback (most recent call last): File "<pyshell#270>", line 1, in <module> concatenate((a,c), axis=0) ValueError: arrays must have same number of dimensions

  27. Convert vector to matrix before concatenating >>> c.shape # one-dimensional (3,) >>> a.shape # two-dimensional (2, 3) >>> array([c]) array([[0, 1, 2]]) >>> concatenate((a, array([c])), axis=0) array([[1, 2, 3], [4, 5, 6], [0, 1, 2]])

  28. Turn matrix into 1-d vector >>> a array([[1, 2, 3], [4, 5, 6]]) >>> ravel(a) array([1, 2, 3, 4, 5, 6])

  29. append: for vectors >>> a = array([1,2,3]) >>> a = append(a, 4) >>> a array([1, 2, 3, 4]) >>> append(a, array([7,8,9])) array([1, 2, 3, 4, 7, 8, 9])

  30. Outline • Overview of NumPy • Creating arrays • Resizing arrays • Indexing and selection • Operations on one arrays • Operations on two arrays • Linear algebra

  31. Indexing vectors >>> b = array([3,5,7,9,11,13,15]) >>> b array([ 3, 5, 7, 9, 11, 13, 15]) >>> b[0] 3 >>> b[5:] array([13, 15]) >>> b[[0,5,2]] # indices can be in any order array([ 3, 13, 7])

  32. Indexing arrays • Let M be a matrix of size m x n • m rows • n columns • m * n total elements • Mi,jis the entry of M at row i and column j >>> a = array([[1, 2, 3], [4, 5, 6]]) >>> a[0,1] # value at row 1, column 2 2

  33. Indexing arrays >>> b = reshape(arange(12), (3,4)) >>> b array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]]) >>> b[0,:] # first row, all cols array([0, 1, 2, 3]) >>> b[1:,:] # second row to end, all cols array([[4, 5, 6, 7], [8, 9, 10, 12]]) >>> b[[0,2],:] # first & third rows, all cols array([[ 0, 1, 2, 3], [ 8, 9, 10, 11]])

  34. Indexing arrays >>> b array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]]) >>> # all rows, cols 2 through 4 (exclusive) >>> b[:,1:3] array([[ 1, 2], [ 5, 6], [ 9, 10]]) >>> b[2,0] # third row, first column 6

  35. Logical selection >>> a array([[1, 2, 3], [4, 5, 6]]) >>> a%2==0 array([[False, True, False], [ True, False, True]], dtype=bool) >>> # get even values of a >>> a[a%2==0] # returns a vector array([2, 4, 6])

  36. Logical selection >>> # where(condition, x, y): >>> # when True, return x, else return y >>> where(a%2==0, 1, -1) # returns a matrix array([[-1, 1, -1], [ 1, -1, 1]]) >>> where(a%2==0, a, 0) # returns a matrix array([[0, 2, 0], [4, 0, 6]])

  37. Unique >>> r = random.randint(0,5, (9,)) >>> r array([0, 3, 2, 2, 2, 1, 1, 4, 3]) >>> unique(r) array([0, 1, 2, 3, 4])

  38. Outline • Overview of NumPy • Creating arrays • Resizing arrays • Indexing and selection • Operations on one array • Operations on two arrays • Linear algebra

  39. Modifying entries in an array >>> a array([[1, 2, 3], [4, 5, 6]]) >>> a[1,2] = 0 >>> a array([[1, 2, 3], [4, 5, 0]])

  40. Modifying entries in an array >>> a[1,:] = array([7,8,9]) >>> a array([[1, 2, 3], [7, 8, 9]]) >>> a[:,0:2] = array([[-1,-2],[-3,-4]]) >>> a array([[-1, -2, 3], [-3, -4, 9]])

  41. Append an array >>> a = array([1,2,3]) >>> a = append(a, 4) >>> a array([1, 2, 3, 4]) >>> append(a, array([7,8,9])) array([1, 2, 3, 4, 7, 8, 9])

  42. Sum >>> a array([[1, 2, 3], [4, 5, 6]]) >>> sum(a, axis=0) # sum over columns array([5, 7, 9]) >>> sum(a, 1) # sum over rows array([ 6, 15]) >>> sum(a) 21 >>> sum(sum(a)) # often in Marsland’s code 21

  43. Elementwise numerical operations >>> a + 1 array([[2, 3, 4], [5, 6, 7]]) >>> a**2 array([[ 1, 4, 9], [16, 25, 36]]) >>> sqrt(a) array([[ 1. , 1.41421356, 1.73205081], [ 2. , 2.23606798, 2.44948974]])

  44. Division >>> a array([[1, 2, 3], [4, 5, 6]]) >>> a / 3 array([[0, 0, 1], [1, 1, 2]]) >>> a / 3.0 array([[ 0.33333333, 0.66666667, 1. ], [ 1.33333333, 1.66666667, 2. ]])

  45. Try to call max >>> max(a) # calling built-in function! Traceback (most recent call last): File "<pyshell#323>", line 1, in <module> max(a) ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

  46. max and min >>> a array([[1, 2, 3], [4, 5, 6]]) >>> # below we call max, min in NumPy >>> a.max(axis=0) # max for each column array([4, 5, 6]) >>> a.max(axis=1) # max for each row array([3, 6]) >>> a.min(1) # min for each row array([1, 4])

  47. argmax and argmin >>> r = random.randint(0,20, (3,4)) >>> r array([[18, 3, 12, 7], [ 2, 12, 5, 4], [ 5, 8, 19, 15]]) >>> # find the index with the max value >>> argmax(r) # returns index as 1-d vector 10 >>> ravel(r) array([18, 3, 12, 7, 2, 12, 5, 4, 5, 8, 19, 15]) >>> ravel(r)[argmax(r)] 19

  48. Sorting >>> r = random.randint(0, 10, (3, 4)) >>> r array([[3, 8, 8, 1], [4, 5, 7, 7], [1, 1, 2, 8]]) >>> sort(r, axis=0) # sort each column array([[1, 1, 2, 1], [3, 5, 7, 7], [4, 8, 8, 8]]) >>> sort(r, 1) # sort each row array([[1, 3, 8, 8], [4, 5, 7, 7], [1, 1, 2, 8]])

  49. argsort >>> q = array(['C','B','E','A','D']) >>> argsort(q) array([3, 1, 0, 4, 2]) >>> q[argsort(q)] array(['A', 'B', 'C', 'D', 'E'], dtype='|S1')

  50. argsort C B E A D Original array 1 2 3 4 0 A B C D E Sorted array 1 2 3 4 0 Index of item in original array 1 0 4 2 3

More Related