Download Presentation
## Data Structures

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**DATA STRUCTURES**The logical or mathematical model of a particular organization of data is called a data structure**DATA STRUCTURES**• A primitive data type holds a single piece of data • e.g. in Java: int, long, char, boolean etc. • Legal operations on integers: + - * / ... • A data structure structures data! • Usually more than one piece of data • Should provide legal operations on the data • The data might be joined together (e.g. in an array): a collection**Static vs. Dynamic Structures**A static data structure has a fixed size This meaning is different than those associated with the static modifier Arrays are static; once you define the number of elements it can hold, it doesn’t change A dynamic data structure grows and shrinks as required by the information it contains**Abstract Data Type**An Abstract Data Type (ADT) is a data type together with the operations, whose properties are specified independently of any particular implementation.**Abstract Data Type**In computing, we view data from three perspectives: Application level View of the data within a particular problem Logical level An abstract view of the data values (the domain) and the set of operations to manipulate them Implementation level A specific representation of the structure to hold the data items and the coding of the operations in a programming language**Problem Solving: Main Steps**Problem definition Algorithm design / Algorithm specification Algorithm analysis Implementation Testing [Maintenance]**Problem Definition**• What is the task to be accomplished? • Calculate the average of the grades for a given student • What are the time / space / speed / performance requirements?**. Algorithm Design / Specifications**• Algorithm: Finite set of instructions that, if followed, accomplishes a particular task. • Describe: in natural language / pseudo-code / diagrams / etc. • Criteria to follow: • Input: Zero or more quantities (externally produced) • Output: One or more quantities • Definiteness: Clarity, precision of each instruction • Finiteness: The algorithm has to stop after a finite (may be very large) number of steps • Effectiveness: Each instruction has to be basic enough and feasible**Implementation, Testing, Maintenances**• Implementation • Decide on the programming language to use • C, C++, Lisp, Java, Perl, Prolog, assembly, etc. , etc. • Write clean, well documented code • Test, test, test • Integrate feedback from users, fix bugs, ensure compatibility across different versions Maintenance**Algorithm Analysis**• Space complexity • How much space is required • Time complexity • How much time does it take to run the algorithm • Often, we deal with estimates!**Space Complexity**• Space complexity = The amount of memory required by an algorithm to run to completion • [Core dumps = the most often encountered cause is “memory leaks” – the amount of memory required larger than the memory available on a given system] • Some algorithms may be more efficient if data completely loaded into memory • Need to look also at system limitations • E.g. Classify 2GB of text in various categories [politics, tourism, sport, natural disasters, etc.] – can I afford to load the entire collection?**Space Complexity (cont’d)**• Fixed part: The size required to store certain data/variables, that is independent of the size of the problem: - e.g. name of the data collection - same size for classifying 2GB or 1MB of texts • Variable part: Space needed by variables, whose size is dependent on the size of the problem: - e.g. actual text - load 2GB of text VS. load 1MB of text**Space Complexity (cont’d)**• S(P) = c + S(instance characteristics) • c = constant • Example: • float sum (float* a, int n) • { • float s = 0; • for(int i = 0; i<n; i++) { • s+ = a[i]; • } • return s; • } • Space? one word for n, one for a [passed by reference!], one for i constant space!**Time Complexity**• Often more important than space complexity • space available (for computer programs!) tends to be larger and larger • time is still a problem for all of us • 3-4GHz processors on the market • researchers estimate that the computation of various transformations for 1 single DNA chain for one single protein on 1 TerraHZ computer would take about 1 year to run to completion • Algorithms running time is an important issue**Running Time**• Problem: prefix averages • Given an array X • Compute the array A such that A[i] is the average of elements X[0] … X[i], for i=0..n-1 • Sol 1 • At each step i, compute the element X[i] by traversing the array A and determining the sum of its elements, respectively the average • Sol 2 • At each step i update a sum of the elements in the array A • Compute the element X[i] as sum/I**Running time**Suppose the program includes an if-then statement that may execute or not: variable running time Typically algorithms are measured by their worst case**Experimental Approach**• Write a program that implements the algorithm • Run the program with data sets of varying size. • Determine the actual running time using a system call to measure time (e.g. system (date) ); • Problems?**Experimental Approach**It is necessary to implement and test the algorithm in order to determine its running time. Experiments can be done only on a limited set of inputs, and may not be indicative of the running time for other inputs. The same hardware and software should be used in order to compare two algorithms. – condition very hard to achieve!**Use a Theoretical Approach**Based on high-level description of the algorithms, rather than language dependent implementations Makes possible an evaluation of the algorithms that is independent of the hardware and software environments**Algorithm Description**• How to describe algorithms independent of a programming language • Pseudo-Code = a description of an algorithm that is • more structured than usual prose but • less formal than a programming language • (Or diagrams) • Example: find the maximum element of an array. • Algorithm arrayMax(A, n): • Input: An array A storing n integers. • Output: The maximum element in A. • currentMax A[0] • for i 1 to n -1 do • ifcurrentMax < A[i] thencurrentMax A[i] • returncurrentMax**Properties of Big-Oh**• Expressions: use standard mathematical symbols • use for assignment ( ? in C/C++) • use = for the equality relationship (? in C/C++) • Method Declarations: -Algorithm name(param1, param2) • Programming Constructs: • decision structures: if ... then ... [else ..] • while-loops while ... do • repeat-loops: repeat ... until ... • for-loop: for ... do • array indexing: A[i] • Methods • calls: object method(args) • returns: return value • Use comments • Instructions have to be basic enough and feasible!**Asymptotic analysis - terminology**• Special classes of algorithms: • logarithmic: O(log n) • linear: O(n) • quadratic: O(n2) • polynomial: O(nk), k ≥ 1 • exponential: O(an), n > 1 • Polynomial vs. exponential ? • Logarithmic vs. polynomial ?**Relatives of Big-Oh**• “Relatives” of the Big-Oh • (f(n)): Big Omega – asymptotic lower bound • (f(n)): Big Theta – asymptotic tight bound • Big-Omega – think of it as the inverse of O(n) • g(n) is (f(n)) if f(n) is O(g(n)) • Big-Theta – combine both Big-Oh and Big-Omega • f(n) is (g(n)) if f(n) is O(g(n)) and g(n) is (f(n)) • Make the difference: • 3n+3 is O(n) and is (n) • 3n+3 is O(n2) but is not (n2)**More “relatives”**• Little-oh – f(n) is o(g(n)) if for any c>0 there is n0 such that f(n) < c(g(n)) for n > n0. • Little-omega • Little-theta • 2n+3 is o(n2) • 2n + 3 is o(n) ?**Example**• Remember the algorithm for computing prefix averages • compute an array A starting with an array X • every element A[i] is the average of all elements X[j] with j < i • Remember some pseudo-code … Solution 1 • Algorithm prefixAverages1(X): • Input: An n-element array X of numbers. • Output: An n -element array A of numbers such that A[i] is the average of elements X[0], ... , X[i]. • Let A be an array of n numbers. • fori 0 ton - 1 do • a 0 • for j 0 toido • a a + X[j] • A[i] a/(i+ 1) • return array A**Example (cont’d)**• Algorithm prefixAverages2(X): • Input: An n-element array X of numbers. • Output: An n -element array A of numbers such that A[i] is the average of elements X[0], ... , X[i]. • Let A be an array of n numbers. • s 0 • for i 0 tondo • s s + X[i] • A[i] s/(i+ 1) • return array A**Back to the original question**• Which solution would you choose? • O(n2) vs. O(n) • Some math … • properties of logarithms: • logb(xy) = logbx + logby • logb (x/y) = logbx - logby • logbxa = alogbx • logba= logxa/logxb • properties of exponentials: • a(b+c) = aba c • abc = (ab)c • ab /ac = a(b-c) • b = a logab • bc = a c*logab**Important Series**Sum of squares: Sum of exponents: Geometric series: Special case when A = 2 20 + 21 + 22 + … + 2N = 2N+1 - 1**Analyzing recursive algorithms**• function foo (param A, param B) { • statement 1; • statement 2; • if (termination condition) { • return; • foo(A’, B’); • }**Solving recursive equations by repeated substitution**T(n) = T(n/2) + c substitute for T(n/2) = T(n/4) + c + c substitute for T(n/4) = T(n/8) + c + c + c = T(n/23) + 3c in more compact form = … = T(n/2k) + kc “inductive leap” T(n) = T(n/2logn) + clogn “choose k = logn” = T(n/n) + clogn = T(1) + clogn = b + clogn = θ(logn)**Solving recursive equations by telescoping**T(n) = T(n/2) + c initial equation T(n/2) = T(n/4) + c so this holds T(n/4) = T(n/8) + c and this … T(n/8) = T(n/16) + c and this … … T(4) = T(2) + c eventually … T(2) = T(1) + c and this … T(n) = T(1) + clogn sum equations, canceling the terms appearing on both sides T(n) = θ(logn)**RECURSION**Suppose P is a procedure containing either a CALL statement to itself or a CALL statement back to original procedure P .Then P is called a recursive procedure Properties: 1. There must be certain criteria called basic criteria, for which the procedure does not call itself. 2. Each time the procedure does call itself (directly or indirectly), it must be closer to the base criteria.**FACTORIAL WITHOUT RECURSION**• FACTORIAL(FACT,N) • This procedure calculates N! and return the vale in the variable FACT . • If N ==0,then :Set FACT:=1, and Return. • Set FACT:=1[Initialize FACT for loop] • Repeat for K:=1 to N • Set FACT:=K*FACT • [END of loop] • 4. Return.**FACTORIAL WITH RECURSION**• FACTORIAL(FACT,N) • This procedure calculates N! and return the vale in the variable FACT . • If N ==0,then :Set FACT:=1, and Return. • Call FACTORIAL(FACT,N-1). • 3. Set FACT:=N*FACT. • 4. Return.**Stack**A stack is a list that has addition and deletion of items only from one end. It is like a stack of plates: Plates can be added to the top of the stack. Plates can be removed from the top of the stack. This is an example of “Last in, First out”, (LIFO). Adding an item is called “pushing” onto the stack. Deleting an item is called “popping” off from the stack.**STACK OPERATION (PUSH)**• PUSH(STACK,TOP,MAXSTK,ITEM) • This procedure pushes an ITEM onto a stack. • 1.[Stack already filled] • If TOP== MAXSTK, then: Print:OVERFLOW, and Return. • 2. Set TOP:=TOP+1.[ Increases TOP by 1] • 3. Set STACK[TOP]:=ITEM. [Inserting ITEM in new TOP position] • Return.**STACK OPERATION (POP)**• POP(STACK,TOP,ITEM) • This procedure deletes the top element of STACK and assigns it to the variable ITEM . • 1.[Stack has an item to be to removed] • If TOP== 0, then: Print:UNDERFLOW, and Return. • 2. Set ITEM:=STACK[top].[ Assigns TOP element to ITEM ] • 3. Set TOP:=TOP-1. [Decreases TOP by 1] • Return.