1 / 36

LING 408/508: Computational Techniques for Linguists

LING 408/508: Computational Techniques for Linguists. Lecture 9 9/10/2012. Outline. Long HW # 1 answers. #1c.

najwa
Download Presentation

LING 408/508: Computational Techniques for Linguists

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LING 408/508: Computational Techniques for Linguists Lecture 9 9/10/2012

  2. Outline • Long HW #1 answers

  3. #1c • If I leave my house at 6:52 am and run 1 mile at an easy pace (8:15 per mile), then 3 miles at tempo (7:12 per mile) and 1 mile at easy pace again, what time do I get home for breakfast?

  4. #1c • Need to convert to a common unit • 6:52 a.m. 6 hours, 52 minutes, 0 seconds • 8:15 0 hours, 8 minutes, 15 seconds • 7:12 0 hours, 7 minutes, 12 seconds • Choose seconds (the unit that is the least common denominator) • Since there are three times to convert, write a function def convert_to_seconds(h, m, s): return 60*60*h + 60*m + s

  5. #1c • If I leave my house at 6:52 am and run 1 mile at an easy pace (8:15 per mile), then 3 miles at tempo (7:12 per mile) and 1 mile at easy pace again, what time do I get home for breakfast? starttime = convert_to_seconds(6, 52, 0) easy = convert_to_seconds(0, 8, 15) tempo = convert_to_seconds(0, 7, 12) endtime = starttime + 1*easy + 3*tempo + 1*easy

  6. #1c # now convert seconds back into h, m, s # (endtime % 3600) is the number seconds # remaining after removing seconds for hours h = endtime // 3600 m = (endtime % 3600) // 60 s = (endtime % 3600) % 60 print(h, 'hours') # 7 hours print(m, 'minutes') # 30 minutes print(s, 'seconds') # 6 seconds

  7. #1c: entire program def convert_to_seconds(h, m, s): return 60*60*h + 60*m + s starttime = convert_to_seconds(6, 52, 0) easy = convert_to_seconds(0, 8, 15) tempo = convert_to_seconds(0, 7, 12) endtime = starttime + 1*easy + 3*tempo + 1*easy # now convert seconds back into h, m, s # (endtime % 3600) is the number seconds # remaining after removing seconds for hours h = endtime // 3600 m = (endtime % 3600) // 60 s = (endtime % 3600) % 60 print(h, 'hours') print(m, 'minutes') print(s, 'seconds')

  8. #2 palindrome, while loop • If a list is not a palindrome, there will be a pair of elements that is not identical 1 2 3 3 2 1 1 2 3 4 3 2 1

  9. Use positive and negative list indices • Let the current iteration be i. • Index from beginning: L[i] • Index from end: L[-1–i] 1 2 3 3 2 1 -3 0 1 2 -2 -1

  10. What is the range for i ? • Starts at 0 • Ends at len(L)//2, exclusive 1 2 3 4 3 2 1 L[0] L[-1-0] L[1] L[-1-1] L[2] L[-1-2]

  11. #2 # iteratively compare elements on the left and right, # starting from the outside and going in # # len(L)/2 tells us how many elements on either side # of the list should be compared, and works # regardless of whether length of list is even or odd def is_palindrome(L): i = 0 while i < len(L)//2: if L[i]!=L[-1-i]: # positive index for left side return False # negative index for right side i += 1 return True # if reach here, never returned False

  12. Testing code # test code # palindrome, odd length print is_palindrome([1,2,3,2,1]) # palindrome, even length print is_palindrome([1,2,3,3,2,1]) # empty list print is_palindrome([]) # not a palindrome, odd length print is_palindrome([1,2,3]) # not a palindrome, even length print is_palindrome([1,2,3,4])

  13. #2 palindrome, without while loop • Take the input list, reverse it, and test for equality 1 2 3 4 5 6 6 5 4 3 2 1

  14. #2 palindrome, without while loop # doesn’t use a while loop # # compare list to itself in reverse def is_palindrome2(L): return L==L[::-1] def is_palindrome3(L): L2 = L[:] # make a copy (don’t want L2.reverse() # to modify input list) return L==L2

  15. This doesn’t work def is_palindrome3(L): L2 = L #[:] # make a copy (don’t want L2.reverse() # to modify input list) return L==L2

  16. #3 Birthday problem • Write a function that generates a list of 28 random numbers between 1 and 365, inclusive. To generate a random number, use randint function in module random. >>> import random >>> help(random.randint) Help on method randint in module random: randint(self, a, b) method of random.Random instance Return random integer in range [a, b], including both end points.

  17. import random def make_class(num_students): students = [] i = 0 while i < num_students: students.append(random.randint(1,365)) i += 1 #same, with a for loop #for i in range(num_students): # students.append(random.randint(1,365)) return students

  18. Write a function that determines whether or not a list contains at least one repeated value, returning either True or False. There are multiple ways to do this; I’ll show you 3 solutions that involve loops

  19. # loop over all pairs of elements, # see if they are the same # if loops terminate, there are no repeats def has_repeat1(L): i = 0 while i < len(L)-1: j = i + 1 while j < len(L): if L[i]==L[j]: return True j += 1 i += 1 return False

  20. # loop over all pairs of elements, # see if they are the same # if loops terminate, there are no repeats def has_repeat2(L): for i in range(0, len(L)-1): for j in range(i+1, len(L): if L[i]==L[j]: return True return False

  21. Some people sorted the list first • Sort a list, then compare adjacent positions to find a repeat • Solution also requires the extra operations of first sorting a list def has_repeat3(L): # don't do L.sort() because don't want to modify # list being passed in to the function L2 = sorted(L) i = 0 while i < len(L2)-1: if L2[i]==L2[i+1]: return True return False

  22. Create 1,000 random classes of students. Calculate the probability that at least two students in the class have the same birthday. What is your result? num_classes = 1000 num_students = 28 num_repeats = 0 for i in range(num_classes): students = make_class(num_students) if has_repeat1(students): num_repeats += 1 print(num_repeats / num_classes) # answer: about 65.6

  23. Entire program import random def make_class(num_students): students = [] for i in range(num_students): students.append(random.randint(1,365)) return students # loop over all pairs of elements, # see if they are the same # # if loops terminate, there are no repeats def has_repeat1(L): i = 0 while i < len(L)-1: j = i + 1 while j < len(L): if L[i]==L[j]: return True j += 1 i += 1 return False num_classes = 100000 num_students = 28 num_repeats = 0 for i in range(num_classes): students = make_class(num_students) if has_repeat1(students): num_repeats += 1 print(num_repeats / num_classes)

  24. #4 prime numbers 1. Create a list of integers from 2 to N: [2, 3, 4, ..., N]. 2. Let p equal 2, the first prime number. 3. All multiples of p less than or equal to N are not prime numbers (2*p, 3*p, etc.). 4. The first number in the list that was not marked as prime in the previous step is a prime number. Replace p with this number. 5. Repeat steps 3 and 4 until p2 is greater than N. 6. All the remaining numbers in the list are prime.

  25. For example, for N = 15: • Initial list for N=15, first prime number is p = 2 [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] • Mark 4, 6, 8, 10, 12, and 14, which are multiples of p = 2 [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] • Mark 6, 9, 12, and 15, which are multiples of p = 3 [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] • Mark 10 and 15, which are multiples of p = 5. Stop since p2 = 25 is greater than N = 15. Primes less than or equal to 15 are 2, 3, 5, 7, 11, and 13. [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]

  26. I’ll show you 3 different solutions,in order of increasing efficiency • How do we represent this in Python? [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] • First attempt: as a shortened list of numbers • Above is represented as [2, 3, 5, 7, 11, 13] • Remove a number from list if it is not a prime

  27. Solution #1 def primes1(N): # all candidate primes candidates = list(range(2,N+1)) p_idx = 0 p = candidates[p_idx] while p**2 < N: i = 2 while i*p <= N: if i*p in candidates: # remove if candidates.remove(i*p) # not prime i += 1 # find next prime p_idx += 1 # next number in list is a prime p = candidates[p_idx] return candidates # remaining nums are prime numbers

  28. Running time of solution #1 • Three nested loops: O(N3) while p**2 < N: while i*p <= N: if i*p in candidates: • The in operator is an implicit third nested loop, since it performs linear search to find a number • Could be faster if use binary search, since the list of candidate primes is in increasing order • Recall that binary search returns the index of a value in a list, or -1 if value is not in list

  29. Solution #2 def primes2(N): candidates = list(range(2,N+1)) p_idx = 0 p = candidates[p_idx] while p**2 < N: i = 2 while i*p <= N: remove_idx = binary_search(candidates, i*p) if remove_idx != -1: del candidates[remove_idx] i += 1 # find next prime p_idx += 1 p = candidates[p_idx] return candidates # remaining nums are prime numbers

  30. Running time of solution #2 while p**2 < N: while i*p <= N: remove_idx = binary_search(candidates, i*p) if remove_idx!=None: del candidates[remove_idx] • Uses binary search instead of in operator • Binary search is O(log N) • Binary search is nested within 2 loops, so running time of algorithm is O(N2 log N)

  31. Idea behind algorithm #3 • Try alternative representation of data [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] • Represent as: [None,None,2,3,None,5,None,7,None,None,None,11,None,13,None,None] • Everything that isn’t None is a prime • Benefit: directly access number to be marked as non-prime • L[i] == i example: index of 12 is 12 • Don’t need to search the list to find the index of a value! • To make this work, add positions for 0 and 1 at beginning of list • The following algorithm is O(N2) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 14

  32. Solution #3 def primes3(N): candidates = list(range(N+1)) candidates[0:2] = [None, None] # 0 and 1 are not primes p_idx = 2 p = candidates[p_idx] while p**2 < N: i = 2 while i*p <= N: candidates[i*p] = None i += 1 while True: # find next prime, skip over Nones p_idx += 1 if candidates[p_idx] != None: p = candidates[p_idx] break prime_numbers = [] # everything that isn't None is a prime for c in candidates: if c!=None: prime_numbers.append(c) return prime_numbers

  33. Solution #3 • To find the next prime, skip over Nones • Example: suppose p_idxis 3, advance to 5 [None, None, 2, 3, None, 5, ...] while True: p_idx += 1 if candidates[p_idx] != None: p = candidates[p_idx] # next prime break

  34. Solution #3 • When the outer loop has terminated, everything that isn't None is a prime • Recover list of prime numbers prime_numbers = [] for c in candidates: if c!=None: prime_numbers.append(c)

  35. Empirical comparison of running times (I’ve used semicolons to separate short statements) import time start = time.clock(); primes1(10000); end = time.clock() print('{0:.5f} seconds'.format(end-start)) start = time.clock(); primes2(10000); end = time.clock() print('{0:.5f} seconds'.format(end-start)) start = time.clock(); primes3(10000); end = time.clock() print('{0:.5f} seconds'.format(end-start)) # output: # 4.32617 seconds algorithm 1: O(N^3) # 0.28724 seconds algorithm 2: O(N^2 * log N) # 0.01832 seconds algorithm 3: O(N^2)

  36. All on one slide def primes1(N): # all candidate primes candidates = list(range(2,N+1)) p_idx = 0 p = candidates[p_idx] while p**2 < N: i = 2 while i*p <= N: if i*p in candidates: # remove if candidates.remove(i*p) # not prime i += 1 # find next prime p_idx += 1 # next number in list is a prime p = candidates[p_idx] return candidates # remaining nums are prime numbers def primes2(N): candidates = list(range(2,N+1)) p_idx = 0 p = candidates[p_idx] while p**2 < N: i = 2 while i*p <= N: remove_idx = binary_search(candidates, i*p) if remove_idx != -1: del candidates[remove_idx] i += 1 # find next prime p_idx += 1 p = candidates[p_idx] return candidates # remaining nums are prime numbers def binary_search(L, val): lo = 0 # initialize lo and hi hi = len(L) - 1 while lo <= hi: # stopping condition mid = (lo + hi) // 2 # middle index guess = L[mid] if guess==val: # compare guess to return mid # value searched for elif guess < val: lo = mid + 1 elif guess > val: hi = mid - 1 return -1 # value not in list def primes3(N): candidates = list(range(N+1)) candidates[0:2] = [None, None] # 0 and 1 are not primes p_idx = 2 p = candidates[p_idx] while p**2 < N: i = 2 while i*p <= N: candidates[i*p] = None i += 1 while True: # find next prime, skip over Nones p_idx += 1 if candidates[p_idx] != None: p = candidates[p_idx] break prime_numbers = [] # everything that isn't None is a prime for c in candidates: if c!=None: prime_numbers.append(c) return prime_numbers import time start = time.clock(); primes1(10000); end = time.clock() print('{0:.5f} seconds'.format(end-start)) start = time.clock(); primes2(10000); end = time.clock() print('{0:.5f} seconds'.format(end-start)) start = time.clock(); primes3(10000); end = time.clock() print('{0:.5f} seconds'.format(end-start)) # output: # 4.32617 seconds algorithm 1: O(N^3) # 0.28724 seconds algorithm 2: O(N^2 * log N) # 0.01832 seconds algorithm 3: O(N^2)

More Related