CS 361 – Chapter 2

CS 361 – Chapter 2 • 2.1 – 2.2 Linear data structures • Desired operations • Implement as an array or linked list • Complexity of operations may depend on underlying representation • Later we’ll look at nonlinear d.s. (e.g. trees)

Linear • There are several linear data structures • Each has desired ADT operations • Can be implemented in terms of (simpler) linear d.s. • 2 most common implementations are array & linked list • Common linear d.s. we can create • Stack • Queue • Vector Note: terminology not universal • List • Sequence • …

Implementation • Array implementation • Already defined in programming language • Fast operations, easy to code • Drawbacks? • Linked list implementation • We define a head and a tail node • Each node has prev and next pointers, so there are no orphans • Space efficient, but trickier to implement • Need to allocate/deallocate memory often, which may have unpredictable execution time in practice • Other implementations possible, but unusual • Array for LL, queue for stack, etc.

Vector • Each item in collection has a rank: how many items come “before” me • Rank is essentially an index, but an array implementation is free to put items anywhere (e.g. starting at 1 instead of 0) • Some useful operations we’d like (names may vary) • get(rank) • set(rank, item) • insert(rank, item) See p.66 for meaning of ins/del • remove(rank) Which of these operations require(s) a loop?

List ADT • Not concerned with index/rank • Position of item is at beginning or end of list, or before/after some other item. • The ketchup is next to the peanut butter, … But you first should know where peanut butter is • Some useful operations: • getFirst() and getLast() • prev(p) and next(p) • replace(p, newItem) • swap(p, q) • Inserting an item at either end of list, or before/after existing one • remove(p) Which operations inherently require a loop?

Compare implementations • We can compare array vs. LL implementations based on an analysis of how they perform d.s. operations. • Primarily determined by their representation • “Sequence” • Combine functionality of vector and list • Again: terminology (vector vs. sequence) not universal. • Sometimes we want to exploit array feature or LL feature • Table on p. 73 compares operation complexity • Always O(1): size, prev, next, replace, swap • O(1) for array only: retrieve/replace at specific rank • O(1) for list only: insert/remove at a given node • Always O(n): insert/remove at rank • What about searching for an element?

Trees • Read section 2.3 • Terminology • Desired operations • How to traverse, find depth & height • Binary trees • Binary tree properties • Traversals • Implementation

Definitions • Tree = connected acyclic graph • Rooted tree, as opposed to a free tree • More useful for us • Nodes arranged in a hierarchy, by level starting with the root node • Other terms related to rooted trees: • Relationships between nodes much richer than a LL: parent, child, sibling, subtree, ancestor, descendant • 2 types of nodes: • Internal • External, a.k.a. Leaf

Definitions (2) Continuing with rooted trees from now on… • Ordered tree = children of a node are ranked 1st, 2nd, 3rd, etc. • Binary tree = each node has at most 2 children, called the left and right child • Not the same as an ordered tree with 2 children. If a node has only 1 child, we still need to tell if it’s the left or right child. • (More on binary trees later) • Different kinds of trees  difficult to implement a silver-bullet tree d.s. for all occasions

Why trees? • Many applications require information stored hierarchically. • Many classification systems • Document structure • File system • Computer program • Mathematical expression • Others? • We mean the data is hierarchical in a logical sense. The low-level rep’n of the data may still be linear. That will be the programmer’s secret.

Desired tree ops • getRoot() • findParent(v) • findChildren(v) – returns list or iterator • An iterator is an object of a special class having methods next() and hasNext() • isLeaf(v) • isRoot(v) And then some operations not so tree specific: • swapValuesAt(v1, v2) • getValueAt(v) • setValueAt(v)

Desiderata (2) • findDepth(v) – distance to root • findHeight() – max depth of all nodes • preorderTraversal(v) • Initially call with root • Recursive function • Can be done as iterator • postorderTraversal(v) • analogous • Pseudocode: preorder(v): process v for each child c of v: preorder(c) postorder(v): for each child c of v: postorder(c) process v See why they are called pre and post? Try an example tree.

Binary trees • Each node has  2 children. • Very useful for CS applications • Special cases • Full binary tree = each node has 0 or 2 children. Suitable for arithmetic expressions. Also called “proper” binary tree. • Complete binary tree = taking “full” one step further: All leaves have the same depth from the root. As a consequence, all other nodes have 2 children. • Generalizations • Positional tree (as opposed to ordered tree) = children have a positional number. E.g. A node may have three children at positions 1, 3 and 6. • K-ary tree = positional tree where there is no child having position higher than k

Binary tree ops • findLeftChild(v) • findRightChild(v) • findSibling(v) – how would this work? • preorder & postorder traversals can be simplified a little, since we know we have  2 children • A 3rd traversal! inorder inorder(v): inorder(v.left) process v inorder(v.right) • For modeling a mathematical expression, these traversals give rise to: prefix, infix and postfix notation!

Binary tree properties • Suppose we have a full binary tree • n = total number of nodes, h = height of tree • Think about why these are true… • h + 1  # leaves  2h • h  # internal nodes  2h – 1 • log2 (n + 1) – 1  h  (n – 1) / 2

Expression as tree • Arithmetic expression is inherently hierarchical • We also have linear/text representations. • Infix, prefix, postfix • Note: prefix and postfix do not need grouping symbols • Postfix expression can be easily evaluated using a stack • Example: (25 – 5) * (6 + 7) + 9 into a tree • Which is the last operator performed?  This is the root. And we can deduce where left and right subtrees are. • Next, for the subtree: (25 – 5) * (6 + 7), last op is the *, so this is the “root” of this subtree. • Notes: • Resulting binary tree is “full.” • Numbers are leaves; operators are internal. This is why the tree drawing is straightforward.

Postfix eval • Our postfix expression is: 25 5 – 6 7 + * 9 + • When you see a number… push. • When you see an operator… pop 2, evaluate, push. • When no more input, pop answer.

Tree & traversal • Given a (binary) tree, we can find its traversals. √ • How about the other way? • Mathematical expression had enough context information that 1 traversal would be enough. • But in general, we need 2 traversals, one of them being inorder. • Example: Draw the binary tree having these traversals. Postorder: S C X H R J Q T Inorder: S R C H X T J Q • Hint: End of the postorder is the root of the tree. Find where the root lies in the inorder. This will show you the 2 subtrees. Continue with each subtree, finding its root and subtrees, etc. • Exercise: Find 2 distinct binary trees t1 and t2 where preorder(t1) = preorder(t2) and postorder(t1) = postorder(t2).

Euler tour traversal • General way to encompass all 3 traversals. • Text p.88 shows “shrink wrap” image of tree • We visit each node on its left, underneath, and its right. • Pseudocode eulerTour(v): do v’s left side action // west eulerTour(v.left) // southwest do v’s under action // south eulerTour(v.right) // southeast do v’s right side action // east

Applications • Can adapt eulerTour( ): • Preorder traversal: “below” and “right” actions are null • Inorder traversal: “left” and “right” actions are null • Postorder traversal: “left” and “below” actions are null • Elegant way to print a fully parenthesized expression: • Left action: print “(“ • Under action: print node contents • Right action: print “)”

Tree implementation • Binary trees: internal representation may be array or links • General trees: array too unwieldy, just do links • Array-based representation • Assign each node in the tree an index • Root = 1 • If a node’s index is p, left child = 2p and right child = 2p + 1 • Array operations are quick • Space inefficient. In worst case, n nodes would require index values up to 2n–1. (How would this happen?) Exponential space complexity is bad.

Implement as links • For binary tree • Each node needs: • Contents • Pointers to left child, right child, parent • Tree overall needs a root node to start with. • For general rooted tree • Each node needs: • Contents • List of pointers to children; pointer to parent • Tree overall needs a root node to start with.

PQ & heap • Section 2.4 • Priority Queue ADT • Heap data structure • Commitment: • Please read about: heap sort; hash tables.

Priority Queue • An ADT where each item has a special value called its “key” or “priority” (in addition to its contents) • Not really a queue • It’s important to be able to find/extract smallest element • Could just as easily be defined with “largest” • Application: • Scheduling a set of tasks. We could use “Earliest Deadline First” or “Shortest Job Next”. At all times, we need to know the winner. • Desired operations: • insert (element, keyValue) • removeNext ( ) • findNext ( )

Implementation • One approach is a sorted list (array). • Insert: O(n), to find the place in the list to insert, and possibly shift over other elements  • Remove and find smallest: O(1), since it’s at the front • Can we do better than O(n) insertion? • Heap implementation  • This d.s. is a special type of binary tree • Complete or almost complete: the lowest level may have gap along the right side, but nowhere else • Heap property: For all nodes i in the heap, value(parent(i))  value(i) with the exception of the root which has no parent.

Why a heap? • Designed so that the PQ insert function can run in time proportional to the height of the tree: O(log(n)), rather than O(n). • Because of the heap property, finding the min element is O(1). • On the other hand, searching the heap for an arbitrary value is not a priority. That would still take O(n). In chapter 3, we’ll look at solving that problem.

Insert & heapify up • To insert a node, make it the last child. • But now, we have probably violated the heap property, which we can restore by doing a “heapify up”. Heapify up: Starting with c, the child we just inserted: • If value(c)  value(parent(c)), we’re done. • Else: swap c with its parent, and continue up the heap at the new location of c. • Example p. 103 • What is the complexity?

Delete & heapify down How to remove smallest element, which is at the root. • Remove the root, and immediately replace it with the last child. • But, we may have just violated the heap property, so… Heapify down: starting with the root node r: • If value(r)  values of both children, we’re done. • Else: swap r with its smaller child, and continue down the heap at the new location of r. • (Why swap with smaller child; does it matter?)

Array, take 2 • Earlier we saw that we didn’t want a sorted array representation. • No need to keep all elements sorted to maintain heap property. • Why is array attractive? • Insert/remove operations need to quickly find last child, which would be at end of array. O(1) • “Almost complete” binary tree: all elements in array contiguous from A[1..n]. • Although internally represented as 1-D array, we still conceive of heap logically as a tree structure.

More on heaps • Finish heap d.s. • Heap sort • How to build heap • Commitment: • Please read through p.124. Take a look at questions on pp. 131-132.

Heap sort • The desired ops for a PQ are enough to allow us to sort some list of items. • Insert(item) • removeMin() • How to sort: • Insert items one at a time • Remove items one by one • Analysis: (n inserts) + (n removeMin’s) • And we know that insert & remove both take O(log n) time. • (Recall that just finding an element is O(1) but need heapify.) • Total time is O(n log n). • More on sorting in Chapter 4.

Optimizing heap sort • Doesn’t mean we’re perfecting it. • Actually, it will still be O(n log n). • The major improvement is in the insertion: we can bring this part down to O(n). • Algorithm p. 109: bottomUpHeap(S) • Given a “sequence” of values, create a heap. • If size(S) < 2, return trivial heap. • Remove S[0] • H1 = bottomUpHeap (firstHalf(S)) • H2 = bottomUpHeap (secondHalf(S)) • H = tree with S[0] at root, and subtrees H1 and H2. • Heapify down H starting at S[0], and return H.

Build heap example • Use the algorithm to build a heap out of: 5,4,7,3,2,6,1 • Not a base case • Remove S[0] = 5 • H1 = bottomUpHeap(4,7,3) • Eventually creates a heap with 3 nodes. • H2 = bottomUpHeap(2,6,1) • Eventually creates a heap with 3 nodes. • H = tree with 7 nodes with 5 at the root. • Need to heapify down. • Try another example. Sometimes the 2 “halves” that we use in recursive call are not exactly the same size. No big deal.

Analysis of bUH • Background: the classic way of creating a heap is to insert n elements one by one: O(n log n). We hope to show bottomUpHeap can do it in O(n). • In the best case, no heapify down is needed. We create n heaps. Creating 1 takes constant time (just an assignment stmt and function call). n * const  Ω(n) • What is different about worst case? • Need to heapify down. Count the number of swaps. • Consider case of 15 nodes. (height = 3) • 1 node needs 3 swaps • 2 nodes each need 2 swaps • 4 nodes each need 1 swap • Consider case of 31 nodes. (height = 4)

Analysis (2) • If we have n nodes, h = floor(log n). • Total maximum # swap operations depends on h. • The formula is: the sum i = 1 to h of: (2h – i nodes * i swaps per node) • Let’s work out the sum from i = 1 to h of… Sum (i 2h – i) = 2h Sum (i 2 – i) = 2h Sum (i / 2 i ) • What is this summation? Consider this: Let S = 1/21 + 2/22 + 3/23 + 4/24 + … Then, S/2 = 1/22 + 2/23 + 3/24 + … Subtracting, we obtain S/2 = 1/2 + 1/4 + 1/8 + 1/16 + … = 1. Therefore, S = 2. • Then, the # of swaps is  2h * 2 = O(2h) = O(n). 

Finish chapter 2 • Dictionary ADT and Hash tables • Commitment: • Please read section 3.1

Dictionary ADT • Each item in some aggregation is assigned a key value. Look up the item by means of the key. • Sounds like an array, except the key value can be anything convenient for us, rather than restricting us to indices 0,1,2,… • Desired operations • findElement (key) • insertItem (item, key) • remove (key) • Finding and removing could fail if the key value is not found in the dictionary.

Implementation • Simple approach: ArrayList of (element, key) pairs. Called a “log file” d.s. • How would we implement the operations? • Inserting O(1) • Finding / removing O(n) • We would hope there’s a lot of inserting, to make this d.s. worthwhile! • More efficient approach: hash table • Array of “buckets” • Hash function to assign element to a bucket

Hash table • Hash code: In a collection of objects, it’s desirable to assign each object a unique number. • Mathematically determined from its key. • There are good and bad ways to compute hash codes. We’d like these codes to be unique. • Compression: Since the hash code may be a big number, scale it down by performing a “mod” operation. • The result is the array index to insert / find / remove. • Collision: Sometimes a 3rd step is needed, in case 2 items map to the same bucket.

Hashing example • Many objects have composite values, as in a string, list or several attributes per object. • Give them numerical values (e.g. ASCII code) and combine (a0, a1, a2, … an–1) into a hash code. • We could add them all up: hashCode = 0 for i = 0 to n-1 hashCode += a[i] • When would this be a good / bad hash function?

Example 2 • To ensure more unique hash codes, we can use a polynomial approach. hash code = a0 c0 + a1 c1 + a2 c2 + … + an–1 c n–1 where c is some constant e.g. 7 • To avoid computing powers of c, we can rewrite the formula: a0 + c(a1 + c(a2 + c(a3 + … c(an – 1)))…) hashCode = a[n-1] for i = n-2 down to 0 hashCode = c * hashCode + a[i]

Collisions • As we insert objects into hash table, collisions are possible. Various ways to handle collision, such as: • Chaining: maintain a list at each bucket. HashSet does this. • Open addressing: look for another “open” cell. • Practice with Q 19-22 on page 132. • A hash table must be larger than # elements anticipated • We can set up a specific “load factor” of 0.75. If the ratio of elements to max size exceeds this factor, allocate a bigger hash table. • Design issues can be resolved with experimentation on your collection of data.

CS 361 – Chapter 2

CS 361 – Chapter 2

Presentation Transcript

the ins and outs of xml and db2 for i5

Organismal Development Part 3

Chapter 2

Strategic Management Seminar

The American Anomaly

Minecraft

Minecraft

Chesapeake Chapter ASSE

Chapter Resources

Chapter 8

Chapter 4 Methods

Walk Two Moons Ch. 1-22 Vocabulary

Chapter Five

Contents

Quantity Take Off

CHAPTER 15

Chapter 3

Chapter 7