Data Structure – Final Review

Data Structure – Final Review SUNY Buffalo

About this review • I’ve been asked to review several data structures covered in class. • May not be totally complete as it is unrealistic to cover all the materials in ~40mins! • Exam may ask questions that weren’t covered in this review but were covered in class. • If you have questions, ask your instructor ASAP. • I’ve used a different book than the one in this class. • Materials were mostly from “Data structure with C”. • I have years of hands on experiences with data structures/algorithms. • If you wonder how data structures are used in the “real world”, ask them. Data Structure Review

Tree ADT: Heap, AVL Tree, Red-Black Tree, and 2-3 Tree (B-tree). Dictionary (map) ADT: Hash tables and hash functions. Graph ADT: (?) Breadth-First Search (BFS), and (?) Depth-First Search (DFS). For each topic, you should prepare to answer: What is it? How to represent it? What operations does it support? How each operation works? Practice your drawing; do as much examples as you can! How long each operation takes? Best-case, Average-case, and Worst-case. Review Topics Data Structure Review

Terminology: Size, height, depth (level), link (edge), path. Root, parent, children, sibling, leaves, ancestor, descendant, etc. Representation: Node structure. Storage: Array, Linked list. Types: Binary tree: Binary Heap. Binary search tree (BST): AVL and R-B. B-tree: 2-3 tree. Operations: insert(), delete(), search(), sort() and etc. Binary tree-walks: Pre-order (Root,L,R), In-order (L,Root,R), Post-order (L,R, Root), Level-order. Time complexity: Insertion: O (log n), Searching: O (log n), Deletion: O (log n), Sorting: O (n log n). Review: Trees Data Structure Review

Binary Tree: Importance of Balance • Binary tree, in general, is useful for implementing many operations: • For examples, search(), successor(), predecessor(), minimum(), maximum(), insert(), and delete() can be achieved in O(h) time, where h is the height of the tree. • That is, the average running time of above operations on a balanced tree is h = O(lg n). • But, the insert() and delete() alter the shape of the tree and can result in an unbalanced tree. • In the worst case, h = O(n) no better than a linked list! • So, we want to correct the imbalance in at most O(lg n) time  no complexity overhead. Data Structure Review

Review: Balanced Trees • To make sure a binary tree is balanced, add a requirement, called the heap property,to the binary tree. • Binary heap is commonly use for implementing Priority Queue ADT. • Aside, heap could also mean the memory space used for dynamic allocation. • To make sure a BST is balanced, add a constrain on the height of BST trees. • The most popular data structures are AVL and Red-Black trees. Data Structure Review

Review: Binary Heap • A binary heap extends binary tree data structure and has the following properties: • Each node has a key <greater|less> than or equal to the key of its children. • Greater - Max heap; Less - Min heap; • The tree is a complete binary tree. • A complete binary tree is a binary tree in which every level, except possibly the last, is completely filled, and all nodes are as far left as possible. • Longest path is ceiling(lg n) for n nodes. Data Structure Review

Heap: Maintaining the Heap Property • heapifyUp() and heaifyDown() are the key operations for maintaining the heap property in O(lg n) time. • How does heapifyDown() work? • Given a node i in the heap. • For maxheap: A[i] < A[left(i)] or A[i] < A[right(i)], swap A[i] with the largest of A[left(i)] and A[right(i)]. • Recurs on that sub-tree. • How does heapifyUp() work? • Given a node i in the heap. • For maxheap: A[i] > A[parent[i]], swap A[i] with A[parent[i]. • Recurs on parent[i]. • What about other operations and their running time? • delete(), insert(), buildHeap(), heapSort(). Data Structure Review

x h Th h-1 h-2 Th-2 Th-1 Size of tree: Th = Th–1 + Th–2 + 1 Review: AVL Tree • An AVL tree extends BST data structure and include the following property: • Any node in the tree has the height difference between its left and right sub-trees is at most one. • Observe that: • The smallest AVL tree of depth 1 has 1 node. • The smallest AVL tree of depth 2 has 2 nodes. AVL: Adelson-Velsky and Landis, 1962 Data Structure Review

x y x y y x y x A C B C A B C A A C B B AVL: Maintaining the AVL Property • Tree rotation is the key operations for maintaining the AVL property in O(lg n) time. • If a node is not balanced, the difference between its children heights is 2. • 4 possible cases with a height difference of 2. (1) (2) (3) (4) Data Structure Review

Case 1: rightRotate (y); x = y.getLeftChild(); y.setLeftChild(x.getRightChild()); x.setRightChild(y); x = y; Case 3: leftRotate (y); rightRotate (x); Case 2: leftRotate(x); y= x.getRightChild(); x.setRightChild(y.getLeftChild()); y.setLeftChild(x) x = y; Case 4: rightRotate(x); leftRotate (y); y x rightRotate(y) x C A y leftRotate(x) A B B C AVL: Maintaining the AVL Property (2) Data Structure Review

Insertion is similar to a regular BST Insert: Search for the position : Keep going left (or right) in the tree until a null child is reached. Insert a new node in this position. An inserted node is always a leaf. Rebalance the tree: Search from inserted node to root looking for any node that violate the AVL property. Use rotation to fix. Only require to find the first unbalanced node. Deletion is similar to a regular BST Delete: Search for the node. Remove it : 0 children: replace it with null 1 child: replace it with the only child 2 children: replace it with right-most node in the left subtree Rebalance the tree: Search from inserted node to root for all node that violate the AVL property. Use rotation to fix. Require to work all the way back to the root. AVL: Insert/Delete Data Structure Review

Review: Red-Black Trees • Red-black trees extends BST data structure and include the following properties: • The root is always black. • Every node is either red or black. • Every leaf (NULL pointer) is black (every “real” node has 2 children). • Both children of every red node are black (can’t have 2 consecutive reds on a path). • Every simple path from node to descendent leaf contains the same number of black nodes. • RB tree has height h  2 lg(n+1). • So, operation is guaranteed to be the height h = O(lg n). Data Structure Review

y x rightRotate(y) x C A y leftRotate(x) A B B C RB Trees: Maintaining RB Tree Property • Tree rotationis the key operation for maintaining a RB tree property in O(lg n) time: • Rotation preserves in-order key ordering • Rotation takes O(1) time (just swaps pointers) Data Structure Review

Insertion is similar to BST’s insert: BST Insert. Color the new node red. Rebalance the tree: If parent is black, done. Otherwise: Parent’s sibling is red. Parent’s sibling is black and new node is a right child. Parent’s sibling is black and new node is a left child. Repeat, moving up the tree until there are no violation. Deletion is similar to BST’s delete: BST Delete; Rebalance the tree: If node is red, color black, done. Otherwise: Sibling has two black children. Sibling’s children are both black. Sibling's left child is red. sibling's right child is black, Sibling is black, sibling's right child is red. Repeat, moving up the tree until there are no violation. RB Trees: Insert/Delete Data Structure Review

< x , y> >y <=x >x and <=y Review: 2-3 B-Trees • A B-tree extends tree data structure and has the following properties: • The root is either a leaf or has between 2 and m children. • Each internal node has between ceiling(m/2) and m children. • Each internal node has between ceiling(m/2)-1 and m-1 keys. • A leaf node has between 1 and m-1 keys. • The tree is perfectly balanced. • So, a 2-3 B-tree is a B-tree of 3 order. • A node can have 2 or 3 children, which means that a node can have 1, 2 or 3 keys. • R-B tree is a B-tree with degree 2. Data Structure Review

Insertion is similar to insert in a BST: Searching for the item. If found, done. Otherwise, Stop at a 2-node? Upgrade the 2-node to a 3-node. Stop at a 3-node? Replace the 3-node by 2 2-nodes and push the middle value up to the parent node. Repeat recursively until you upgrade a 2-node or create a new root. When is a new root created? Deletion is similar to delete in a BST. Start deletion at a leaf. Swap the value to be deleted with its immediate successor in the tree. Delete the value from the node. If the node still has a value, done. We’ve changed a 3-node into a 2-node; Otherwise, Find a value from sibling or parent. 2-3: Insert/Delete Data Structure Review

Review: Hash Tables • Given n elements, each with a key and satellite data, we need to support: • insert (T, x), delete (T, x), and search(T, x), • But, don’t care about sorting the elements. • Suppose no two elements have the same key and the range of keys is 0…m-1, where m is not too large. • Set up an array T[0…m-1] in which. • T[i] = x if x T and i=h(key(x)); • T[i] = NULL otherwise. • h() is called the hash function (or hashing) and T is called a direct-address table. • Hash tables support insert, delete, and search in O(1) expected time. Data Structure Review

Hash: Resolving Collisions • Collision happens when two keys hash to the same memory location. • Two ways to resolve collisions: • Open addressing: • To insert, if slot is full, try another slot, and another, until an open slot is found (probing). • To search, follow same sequence of probes as would be used when inserting the element. • Chaining: • To insert, keep linked list of elements in slots.Upon collision, just add new element to list. • To search: search the linked list. Data Structure Review

Review: Graphs • A graph G = (V, E), where V = set of vertices, E = set of edges. • Densegraph: |E|  |V|2 • Sparse graph: |E|  |V| • Undirected graph: • Edge (u,v) = edge (v,u) • No self-loops • Directed graph: • Edge (u,v) goes from vertex u to vertex v, notated uv • A weighted graph associates weights with either the edges or the vertices. Data Structure Review

1 a d 2 4 b c 3 Graphs: Adjacency Matrix • Assume V = {1, 2, …, n}. • An adjacency matrixrepresents the graph as a n x n matrix A: • A[i, j] = 1 if edge (i, j)  E (or weight of edge) = 0 if edge (i, j)  E Data Structure Review

Graphs: Adjacency List • An adjacency list represents the graph as an array of linked list. • For each vertex v  V, store a list of vertices adjacent to v. • Example: • Adj[1] = {2,3} • Adj[2] = {3} • Adj[3] = {} • Adj[4] = {3} • Variation: can also keep a list of edges coming into vertex. 1 2 4 3 Data Structure Review

Graphs: Storage • Adjacency matrix takes O(V2) storage. • Usually too much storage for large graphs. • But can be very efficient for small graphs. • Adjacency list takes O(V+E) storage: • The degree of a vertex v = # incident edges. • For directed graphs, # of items in adjacency lists is: •  out-degree(v) = |E| takes (V + E) storage. • For undirected graphs, # items in adjacency lists is: •  degree(v) = 2 |E| (handshaking lemma) also takes (V + E) storage. • Most large interesting graphs are sparse. • E.g., planar graphs, in which no edges cross, have |E| = O(|V|) by Euler’s formula; • So, the adjacency list is often a more appropriate representation. Data Structure Review

Review: Graph Searching • Given: a graph G = (V, E), directed or undirected. • Goal: systematically explore every vertex and every edge. • General idea: build a tree on the graph. • Pick a vertex as the root, • Choose certain edges to produce a tree. • Note: might also build a forest if graph is not connected. Data Structure Review

Breadth-First Search • General idea: • Expand frontier of explored vertices across the breadth of the frontier. • Pick a source vertex to be the root. • Find (“discover”) its children, then their children, etc. • Associate vertex “colours”: • White vertices have not been discovered. • All vertices start out white. • Grey vertices are discovered but not fully explored. • They may be adjacent to white vertices. • Black vertices are discovered and fully explored. • They are adjacent only to black and grey vertices. • Explore vertices by scanning the FIFO queue of grey vertices. Data Structure Review

BFS and Shortest-path • BFS can thought of as being like Dijkstra’s for shortest-path except every edge has the same weight. • BFS calculates the shortest-path distance to the source node. • Shortest-path distance (s,v) = minimum number of edges from s to v, or  if v not reachable from s. • Proof should be in the book. • BFS builds breadth-first tree, in which paths to root represent shortest paths in G. • Thus can use BFS to calculate shortest path from one vertex to another in O(V+E) time. Data Structure Review

Depth-First Search • General idea: • Explore “deeper” in the graph whenever possible. • Edges are explored out of the most recently discovered vertex v that still has unexplored edges. • When all of v’s edges have been explored, backtrack to the vertex from which v was discovered. • Like BFS, associate vertex “colours”: • Vertices initially white. • Then coloured grey when discovered. • Then coloured black when finished. • Explore vertices by scanning the stack of grey vertices Data Structure Review

DFS And Cycles • An undirected graph is acyclic iff a DFS yields no back edges • If acyclic, no back edges (because a back edge implies a cycle). • If no back edges, acyclic. • No back edges implies only tree edges (Why?) • Only tree edges implies we have a tree or a forest, which by definition is acyclic. • Thus, can run DFS to find whether a graph has a cycle. • We can actually determine if cycles exist in O(V) time: • In an undirected acyclic forest, |E|  |V| - 1. • So count the edges: if ever see |V| distinct edges, must have seen a back edge along the way. Data Structure Review

Remarks (1) • Clearly data structures and algorithms are closely related: • Selecting the most efficient data structure and algorithm will almost always be the best way to proceed. • However, consideration of many factors are required to produce a good implementation: • The obvious solution isn’t always the best. • Sometimes it makes sense to have multiple data structures, each with different properties, to represent a single object. • Factors to be considered are: • The memory footprint implied by a given representation. • The cost of operations in that representation. • The cost of converting to another representation. • The amount of computation expected using a given representation. Data Structure Review

Remarks (2) • When it comes to the implementation of an algorithm, the main point is that constant factors matter. • Mapping algorithms and data structures in a way that matches the architecture characteristics is VERY important ! • Often, require to restructure a program, not functionally but behaviourally, to get better performance. • However, restructuring code, can be a bit more involved than just performing optimisations. • So, the bottom line is to think about trade-offs that could change the quality of an implementation. • Direct, obvious algorithm translations don’t always mean good performance; • Best performance comes from considering the many aspects of execution, e.g., memory access, processor characteristics, language overheads. Data Structure Review

Good Luck! Data Structure Review

Data Structure – Final Review

Data Structure – Final Review

Presentation Transcript