Create Presentation
Download Presentation

Download Presentation
## UTPA Computer Science

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**UTPA Computer Science**Dr. Zhixiang Chen Professor and Chair Department of Computer Science**What Is Computer Science?**• The systematic study of the feasibility, structure, expression, and mechanization of the methodical processes (or algorithms) that underlie the acquisition, representation, processing, storage, communication of, and access to information, whether such information is encoded in bits and bytes in a computer memory or transcribed in genes and protein structures in a human cell. • The fundamental question underlying all of computing is: what computational processes can be efficiently automated and implemented? • The core is about problem solving, efficient problem solving, and automated problem solving.**Two undergraduate programs**BSCS (Broad-field major in CS) ABET accredited since F2001 Computer Engineering (CMPE) program ABET Accredited since 2011 Jointly with EE department Students: over 440 Two Masters Graduate Programs MS in Computer Science MS in Information Technology Graduate Students: close to 100 UTPA Computer Science**Graduates working with major companies:**Microsoft, IBM, Xerox, Intel, WalMart, Exxon-Mobil, Local Business, Independent School Districts, UTPA Computer Center, IAG Houston All recent CS/CMPE graduates have job placements or move to graduate schools IBM, Microsoft, WalMart, Xerox, and other companies aggressively recruit CS students. Career Placements**Scholarships**O’Dell Memorial Scholarship Computer Science Alumni Scholarship Wal-Mart Scholarship Xerox Scholarship Engineering and Computer Science Boeing and others University Scholarships UTPA Excellence University Scholar Student Financial Services Scholarships**Graduate Studies**• Many graduates move on to pursue MS/Doctoral graduate Studies • Laura Grabowski got her Ph.D. Dissertation at Michigan State and is now a professor in the Department • Student Research • 15+ senior projects every semester • All graduate students need to finish a research project or a thesis • A number of students have publications in journals and conference proceedings • The annual UTPA Computer Science Student Research Day Conference - CSSRD**Best Jobs in America**www.CareerExplorer.net Network Systems and Data Communications AnalystsTen hottest careers rank: 1Salary Range: $18,610 — $96,860 Computer Application Software EngineersTen hottest careers rank: 5Salary Range: $18,610 — $96,860 Database AdministratorsTen hottest careers rank: 8Salary Range: $16,460 — $74,390**Top Five Fastest Growing IT Jobs Now - 2014**1) Network systems and data communications analyst • Expected growth rate; 54.6% •Middle 50 percent earned between $46,480 and $78,060. 2) Computer applications software engineer • Expected growth rate: 48.4% •Middle 50 percent earned between $59,130 and $92,130. 3) Computer systems software engineer • Expected growth rate: 43% •Middle 50 percent earned between $63,150 and $98,220. 4) Network and computer systems administrator • Expected growth rate: 38.4% •Middle 50 percent earned between $63,150 and $98,220. 5) Database administrator • Expected growth rate: 38.2% •Middle 50 percent earned between $46,260 and $73,620. U.S. Bureau of Labor Statistics**A Large Picture of CS**Theory Systems and Architectures Languages Applications Software = Program = algorithms + Data Structures**Sorting: Some Taste of Algorithm and Data Structures**• Sorting is to arrange a list of data in either increasing or decreasing order • Sorting is fundamental in computer science • We shall examine quicksort.**Quicksort**• Quicksort is the fastest known sorting algorithm in practice. • Its average running time is O(NlogN). • Quicksort is a divide-and-conquer recursive algorithm. • The algorithm • If the number of elements in input array S is 0 or 1, then return. • Pick any element v in S. This is called the pivot. • Partition S-{v} (the remaining elements in S) into two disjoint groups: S1 = {x S – {v} | x <= v}, and S2 = {x S – {v} | x > v} • Return {quicksort(S1) followed by v followed by quicksort(S2)}.**Issues in Quicksort**• How to choose the pivots? • The choices of pivots affect the performance of quicksort a lot! • If the pivots being chosen are in sorted order, quicksort degenerates to selectionSort. For instance, sort 13, 81, 43, 92, 31, 65, 57, 26, 75, 0 Pivots we choose happen to be: 0 13 26 31 43 57 65 75 81 92 • We hope that by using the pivot, we can partition the array into two subarrays with nearly equal size. 13 81 43 92 31 65 57 26 75 0 partition using 0 013 81 43 92 31 65 57 26 75 partition using 13 013 81 43 92 31 65 57 26 75 partition using 26 0 13 26 81 43 92 31 65 57 75 partition using 31 0 132631 81 43 92 65 57 75**Quicksort Example**13 81 43 92 31 65 57 26 75 0 • Sort 13, 81, 43, 92, 31, 65, 57, 26, 75, 0 • The pivot is chosen by chance to be 65. select pivot 13 81 43 92 31 65 57 26 75 0 partition 13 43 31 57 26 06581 92 75 quicksort large quicksort small 0 13 26 31 43 576575 81 92 0 13 26 31 43 576575 81 92**How to partition the array?**• Quicksort recursively solves two subproblems and requires linear additional work (partitioning step), but, the subproblems are not guaranteed to be of equal size, which is potentially bad. • The reason that quicksort is faster is that the partitioning step can actually be performed in place and very efficiently. This efficiency more than makes up for the lack of equal-sized recursive calls.**Picking the Pivot**• A wrong way • Using the first element as the pivot • The choice is acceptable if the input is random. • If the input is presorted or in reverse order, then the pivot provides a poor partition, because either all the elements go into S1 or they go into S2. • The poor partition happens consistently throughout the recursive calls if the input is presorted or in reverse order. • The practical effect is that if the first element is used as the pivot and the input is presorted, then quicksort will take quadratic time to do essentially nothing at all. 92 81 75 65 57 43 31 26 13 0 partition using 92 81 75 65 57 43 31 26 13 092 81 75 65 57 43 31 26 13 0 partition using 81 75 65 57 43 31 26 13 081**Picking the Pivot**• A wrong way • Using the first element as the pivot (cont.) • Moreover, presorted input (or input with a large presorted section) is quite frequent, so using the first element as pivot is an absolutely horrible idea and should be discarded immediately. • Choosing the larger of the first two distinct elements as pivot • This has the same bad properties as merely choosing the first elements. • Do NOT use that pivoting strategy either.**Picking the Pivot**• A safe maneuver • A safe course is merely to choose the pivot randomly. • This strategy is generally perfectly safe, since it is very unlikely that a random pivot would consistently provide a poor partition. • However, random number generation is generally an expensive commodity and does not reduce the average running time of the rest of the algorithm at all. • Using median • The median of a group of N numbers is the N/2th largest number. • The best choice of pivot would be the median of the array, since it gives two subarrays with nearly equal size (actually, can be different by at most 1). • Theoretically, the median can be found in linear time, thus would not affect the running time of quicksort; but in practice, it would slow down quicksort considerably. 13 81 43 92 31 65 57 26 75 0 partition using 57 13 43 31 26 05781 92 65 75**Picking the Pivot**• Median-of-Three partitioning • Pick three elements randomly and use the median of these three as pivot. (Random numbers are expensive to generate.) • The common course is to use as pivot the median of the left, right, and center (in position (left + right)/2)elements. • Using median-of-three partitioning clearly eliminates the bad case for sorted input (the partitions become equal in this case) and actually reduces the running time of quicksort by about 5%. 92 81 75 65 57 43 31 26 13 0 partition using 57 43 31 26 13 0 5792 81 75 65**i**j Partitioning Strategy • Objective: to move all the small elements to the left part of the array and all the large elements to the right part. “Small” and “large” are, of course, relative to the pivot. • The basic idea • Swap the pivot with the last element; thus the pivot is placed at the last position of the array. • i starts at the first element and j starts at the next-to-last element. • While i is to left of j, we move i right, skipping over elements that are smaller than the pivot; we move j left, skipping over elements that are larger than the pivot. • When i and j have stopped, i is pointing at a large element and j is pointing at a small element. • If i is to the left of j, those elements are swapped. (The effect is to push a large element to the right and a small element to the left.) 8 1 4 9 0 3 5 2 7 6**i**j i j j i j i j i Partitioning Strategy Example 81 13 43 92 31 65 57 26 75 0 • Input: 81 13 43 92 31 65 57 26 75 0 Pivot: 57 81 13 43 92 31 65 0 26 7557 i stopped move j left 81 13 43 92 31 65 0 26 75 57 j stopped swap 26 13 43 92 31 65 0 81 75 57 move i right 26 13 43 92 31 65 0 81 75 57 i stopped move j left 26 13 43 92 31 65 0 81 75 57 j stopped**i**j i j j i j i Partitioning Strategy Example • Input: 81 13 43 92 31 65 57 26 75 0 Pivot: 57 (cont.) 26 13 43 92 31 65 0 81 75 57 j stopped swap 26 13 43 0 31 65 92 81 75 57 move i right 26 13 43 0 31 6592 81 75 57 i stopped move j left 26 13 43 0 3165 92 81 75 57 j stopped i and j have crossed. Swap the element at position i with the pivot. 26 13 43 0 31 57 92 81 75 65**Small Arrays**• For very small arrays (N <= 20), quicksort does not perform as well as insertion sort. • Because quicksort is recursive, these cases will occur frequently. • A common solution is not to use quicksort recursively for small arrays, but instead use a sorting algorithm that is efficient for small arrays, such as insertion sort. • Using this strategy can actually save about 15% in the running time. A good cutoff range is N=10. • This also saves nasty degenerate cases, such as taking the median of three elements when there are only one or two.**Quicksort C++ code**a center left right large Sort elements at positions left, right, and center. small median a center left right median small a[right-1] a center left right**Quicksort C++ code**a left right pivot small a[right-1] a left i center j right Partition the input array into two subarrays: one contains small elements; the other contains large elements.**Analysis of Quicksort**• We will do the analysis for a quicksort, assuming a random pivot (no median-of-three partitioning) and no cutoff for small arrays. • We take T(0) = T(1) = 1. • The running time of quicksort is equal to the running time of the two recursive calls plus the linear time spent in the partition (the pivot selection takes only constant time). This gives the basic quicksort relation T(N) = T(i) + T(N-i-1) + cN where i = |S1| is the number of elements in S1.**Analysis of Quicksort**• Worst-Case Analysis • The pivot is the smallest element, all the time. Then i = 0 and if we ignore T(0) = 1, which is insignificant, the recurrence is T(N) = T(N-1) + cN, N > 1 We telescope, using equation above repeatedly. Thus T(N-1) = T(N-2) + c(N-1) T(N-2) = T(N-3) + c(N-2) ……….. T(3) = T(2) + c(3) T(2) = T(1) + c(2) Adding up all these equations yields**Analysis of Quicksort**• Best-Case Analysis • In the best case, the pivot is in the middle. • To simply the analysis, we assume that the two subarrays are each exactly half the size of the original. Although this gives a slight overestimate, this is acceptable because we are only interested in a Big-Oh notation. T(N) = 2T(N/2) + cN Divide both sides of the equation by N. We will telescope using this equation. Adding up all equations, we have which yields T(N) = cNlogN + N = O(NlogN)**Analysis of Quicksort**• Average-Case Analysis • Recall that T(N) = T(i) + T(N–i–1) + cN (1) • The average value of T(i), and hence T(N-i-1), is • Equation (1) then becomes (2) If Equation (2) is multiplied by N, it becomes (3) We telescope with one more equation. (4) (3) – (4), we obtain NT(N) – (N-1)T(N-1) = 2T(N-1) + 2cN – c We rearrange terms and drop the insignificant –c on the right, obtaining NT(N) = (N+1)T(N-1) + 2cN (5) Divide Equation (5) by N(N+1): (6)**This is the sum of harmonic series, which is O(logN).**Analysis of Quicksort • Average-Case Analysis (cont.) (6) Now we can telescope. Adding them up yields Thus, we have and so T(N) = O(NlogN)**Huffman Codes: Some Taste of Data Compression and**Decompression • Data compression and decompression is critical for fast data transmitting, storage, and retrieval. • Huffman code is a classical technique for textual data compression and decompression**Huffman Codes**• A data file with 100K characters, which we want to store or transmit compactly. • Only 6 different characters in the file with their frequencies shown below. • Design binary codes for the characters to achieve maximum compression. • Using fixed length code, we need 3 bits to represent six characters. a b c d e f freq(K) 45 13 12 16 9 5 code 1 000 001 010 011 100 101 • Storing the 100K character requires 300K bits using this code. • Can we do better?**Huffman Codes**• We can improve on this using variable length codes. • Motivation: use shorter codes for more frequent letters, and longer codes for infrequent letters. • An example is the code 2 below. a b c d e f freq(K) 45 13 12 16 9 5 code 1 000 001 010 011 100 101 code 2 0 101 100 111 1101 1100 • Using code 2, the file requires (1*45+3*13+3*12+3*16+4*9+4*5)K = 224K bits. • Improvement is 25% over fixed length codes. In general, variable length codes can give 20-90% savings.**Variable Length Codes**• In fixed length coding, decoding is trivial. Not so with variable length codes. • Suppose 0 and 000 are codes for x and y, what should decoder do upon receiving 00000? • We could put special marker codes but that reduce efficiency. • Instead we consider prefix codes: no codeword is a prefix of another codeword. • So, 0 and 000 will not be prefix codes, but 0, 101, 100, 111, 1101, 1100 are prefix code. • To encode, just concatenate the codes for each letter of the file; to decode, extract the first valid codeword, and repeat. • Example: Code for “abc” is 0101100. “001011101” uniquely decodes to “aabe” a b c d e f code 2 0 101 100 111 1101 1100**Tree Representation**• Decoding best represented by a binary tree, with letters as leaves. • Code for a letter is the sequence of bits between root and that leaf. • An optimal tree must be full: each internal node has two children. Otherwise we can improve the code.**Measuring Optimality**• Let C be the alphabet. Let f(x) be the frequency of a letter x in C. • Let T be the tree for a prefix code; let dT(x) be the depth of x in T. • The number of bits needed to encode our file using this code is • We want a T that minimizes B(T).**Huffman’s Algorithm**• Initially, each letter represented by a singleton tree. The weight of the tree is the letter’s frequency. • Huffman repeatedly chooses the two smallest trees (by weight), and merges them. • The new tree’s weight is the sum of the two children’s weights. • If there are n letters in the alphabet, there are n-1 merges. • Pseudo-code: build a heap Q on C; for I = 1 to n-1 do z = a new tree node; x = left[z] = DeleteMin(Q); y = right[z] = DeleteMin(Q); f[z] = f[x] + f[y]; Insert(Q, z);**Illustration**• Show the steps of Huffman algorithm on our example. 24 24**Analysis of Huffman**• Running time is O(nlogn). Initial heap building plus n heap operations. • We now prove that the prefix code generated is optimal. • It is a greedy algorithm, and we use the standard swapping argument. • Lemma: Suppose x, y are the two letters of lowest frequency. Then, there is optimal prefix code in which codewords for x and y have the same (maximum) length and they differ only in the last bit.**Correctness of Huffman**• Let T be optimal tree and b, c be the two sibling letters at max depth. • Assume f(b) <= f(c), and f(x) <= f(y). • Then f(x) <= f(b) and f(y) <= f(c). • Transform T into T’ by swapping x and b. • Since dT(b) >= dT(x) and f(b) >= f(x), the swap does not increase the frequency * depth cost. That is, B(T’) <= B(T). • Similarly, we next swap y and c. If T was optimal, so must be T”. • Thus, the greedy merge done by Huffman is correct.**Correctness of Huffman**• The rest of the argument follows from induction. • When x and y are merged; we pretend a new character z arises, with f(z) = f(x) + f(y). • Compute the optimal code/tree for these n-1 letters: C{z}-{x,y}. • Attach two new leaves to the node z, corresponding to x and y.**RSA-Public Key Cryptosystem:Some Taste Computer Security**• The importance of computer security is obvious. • No one wants to an insecure computer system. • Public key cryptosystem is a miracle: Even though the key is given, nobody (besides the sender and the receiver) can decipher the encrypted message**Hard Problems**Some problems are hard to solve. No polynomial time algorithm is known. e.g., NP-hard problems such as machine scheduling, bin packing, 0/1 knapsack, finding prime factors of an n-digit number. Is this necessarily bad? No! Data encryption relies on difficult to solve problems.**encryption algorithm**message Transmission Channel decryption algorithm message Cryptography encryptionkey decryptionkey**Public Key Cryptosystem (RSA)**• A public encryption method that relies on a public encryption algorithm, a public decryption algorithm, and a public encryption key. • Using the public key and encryption algorithm, everyone can encrypt a message. • The decryption key is known only to authorized parties.**Public Key Cryptosystem (RSA)**• p and q are two prime numbers. • n = pq • m = (p-1)(q-1) • a is such that 1 < a < m and gcd(m,a) = 1. • b is such that (ab) mod m = 1. • a is computed by generating random positive integers and testing gcd(m,a) = 1 using the extendedEuclid’s gcd algorithm. • The extended Euclid’s gcd algorithm also computes b when gcd(m,a) = 1.**RSA Encryption And Decryption**• Message M < n. • Encryption key = (a,n). • Decryption key = (b,n). • Encrypt => E = Ma mod n. • Decrypt => M = Eb mod n.**Breaking RSA**• Factor n and determine p and q, n = pq. • Now determine m = (p-1)(q-1). • Now use Euclid’s extended gcd algorithm to compute gcd(m,a). b is obtained as a byproduct. • The decryption key (b,n) has been determined!**Security Of RSA**• Relies on the fact that prime factorization is computationally very hard. • Let k be the number of bits in the binary representation of n. • No algorithm, polynomial in k,is known to find the prime factors of n. • Try to find the factors of a 100 bit number.**PageRank: A Taste of Web Technology**• The Web has become part of our daily life experience. • The Web proves we are in the age of computing civilization. • The success of the Web replies on an army of computer science technology • Ranking web pages for any given query is one of the fundamental ones.