1 / 60

Data Structures Specification and Implementation

CSE 5350/7350 Introduction to Algorithms. Data Structures Specification and Implementation. Textbook readings: Cormen: Part III, Chapters 10-14 Mihaela Iridon , Ph.D. mihaela@engr.smu.edu. Objectives. Understand what dynamic sets are Learn basic techniques for Representing &

rangle
Download Presentation

Data Structures Specification and Implementation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSE 5350/7350 Introduction to Algorithms Data StructuresSpecification and Implementation Textbook readings: Cormen: Part III, Chapters 10-14 Mihaela Iridon, Ph.D. mihaela@engr.smu.edu Data Structures

  2. Objectives • Understand what dynamic sets are • Learn basic techniques for • Representing & • Manipulating finite dynamic set • Elementary Data Structures • Stacks, queues, heaps, linked lists • More Complex Data Structures • Hash tables, binary search trees • Data Structures in C#.NET 2.0 Data Structures

  3. High-Level Structure (1) • Arrays • System.Collections.ArrayList • System.Collections.Generic.List • Queue • System.Collections.Generic.Queue • Stack • System.Collections.Generic.Stack Data Structures

  4. High-Level Structure (2) • Hashtable • System.Collections.Hashtable • System.Collections.Generic.Dictionary • Trees • Binary Trees, BST, Self-Balancing BST • Linked Lists • System.Collections.Generic.LinkedList • Graphs Data Structures

  5. Dynamic Data Sets Definition Why dynamic General examples Data structures and the .NET framework “An Extensive Examination of Data Structures Using C# 2.0” – Scott Mitchell http://msdn2.microsoft.com/en-us/library/ms364091(VS.80).aspx Data Structures

  6. Data Structure Design Impact on efficiency/running time The data structure used by an algorithm can greatly affect the algorithm's performance Important to have rigorous method by which to compare the efficiency of various data structures Data Structures

  7. Example: file extension search public bool DoesExtensionExist(string [] fileNames, string extension) { int i = 0; for (i = 0; i < fileNames.Length; i++) if (String.Compare(Path.GetExtension(fileNames[i]), extension, true) == 0) return true; return false; // If we reach here, we didn't find the extension } } Search is of O(n) Data Structures

  8. The Array Linear Simple Direct Access Homogeneous Most widely used Data Structures

  9. The Array (2) The contents of an array are stored in contiguous memory. All of the elements of an array must be of the same type or of a derived type; hence arrays are referred to as homogeneous data structures. Array elements can be directly accessed. With arrays if you know you want to access the ith element, you can simply use one line of code: arrayName[i]. Data Structures

  10. Array Operations • Allocation • Accessing • Declaring an array in C#: string[] myArray; (initially myArray reference is null) • Creating an array in C#: myArray = new string[5]; Data Structures

  11. Array Allocation • string[] myArray = new string[someIntegerSize]; •  this allocates a contiguous block of memory on the heap (CLR-managed) Data Structures

  12. Array Accessing • Accessing an element at index i: O(1) • Searching through and array • Unsorted: O(n) • Sorted: O(log n) • Array class: static method: • Array.BinarySearch(Array input, object val) Data Structures

  13. Array Resizing • When the size needs to change: • Must create a new array instance • Copy old array into new array: Array1.CopyTo(Array2, 0) • Time consuming • Also, inserting into an array is problematic Data Structures

  14. Multi-Dimensional Arrays • Rectangular • n x n • n x n x n x … • Accessing: O(1) • Searching: O(nk) • Jagged/Ragged • n1 x n2 x n3 x … Data Structures

  15. Goals Type-safe Performant Reusable Example: payroll application Data Structures

  16. System.Collections.ArrayList Can hold any data type: (hybrid) Internally: array object Automatic resizing Not type safe: casting  errors detected only at runtime Boxing/unboxing: extra-level of indirection  affects performance Loose homogeneity Data Structures

  17. Generics • Remedy for Typing and Performance • Type-safe collections • Reusability • Example: public class MyTypeSafeList<T> { T[] innerArray = new T[0]; } Data Structures

  18. List • Homogeneous • Self-Re-dimensioning Array • System.Collections.Generic.List List<string> studentNames = new List<string>(); studentNames.Add(“John”); … string name = studentNames[3]; studentNames[2] = “Mike”; Data Structures

  19. List Methods • Contains() • IndexOf() • BinarySearch() • Find() • FindAll() • Sort() • Asymptotic Running Time: same as array but with extra overhead Data Structures

  20. Ordered Requests Processing First-come, First-serve (FIFO) Priority-based processing Inefficient to use List<T> List will continue to grow (internally, the size is doubled every time) Solution: circular list/array Problem: initial size?? Data Structures

  21. Queue • System.Collections.Generic.Queue • Operations: • Enqueue() • Dequeue() • Contains() • ToArray() • Peek() • Does not allow random access • Type-safe; maximizes space utilization Data Structures

  22. Queue (continued) • Applications: • Web servers • Print queues • Rate of growth: • Specified in the constructor • Default: double initial size Data Structures

  23. Stack • LIFO • System.Collections.Generic.Stack • Operations: • Push() • Pop() • Doubles in size when more space is needed • Applications: • CLR call stack (functions invocation) Data Structures

  24. Limitations of Ordinal Indexing • Ideal access time: O(1) • If index is unknown • O(n) if not sorted • O(log n) if sorted • Example: SSN: 10 ^ 9 possible combinations • Solution: compress the ordinal indexing domain with a hash function; e.g. use only 4 digits Data Structures

  25. Hash Table • Hashing: • Math transformation of one representation into another representation • Hash table: • The array that uses hashing to compress the indexers space • Cryptography (information security) • Hash function: • Non-injective (not a one-to-one function) • “Fingerprint” of initial data Data Structures

  26. Goals • Fast access of items in large amounts of data • Few collisions as possible • collision avoidance • Avalanche effect: • Minor changes to input  major changes to output Data Structures

  27. Collision Resolution (1) • Probability to map to a given location: 1/k (k = size = number of slots) • (1) Linear Probing Is H[i] empty? • YES: place item at location I • NO: i = i + 1; repeat • Deficiency: clustering • Access and Insertion: no longer O(1) Data Structures

  28. Collision Resolution (2) • (2) Quadratic Probing • Check s + 12 • Check s – 12 • Check s + 22 • Check s – 22 • … • Check s +/- i2 • Clustering a problem as well Data Structures

  29. Collision Resolution (3) • (3) Rehashing – used by Hashtable (C#) • System.Collections.Hashtable • Operations: • Add(key, item) • ContainsKey() • Keys() • ContainsValue() • Values() • Key, Value: any type  not type safe Data Structures

  30. Hashtable Data Type – Example using System; using System.Collections; public class HashtableDemo { private static Hashtable employees = new Hashtable(); public static void Main() { // Add some values to the Hashtable, indexed by a string key employees.Add("111-22-3333", "Scott"); employees.Add("222-33-4444", "Sam"); employees.Add("333-44-55555", "Jisun"); // Access a particular key if (employees.ContainsKey("111-22-3333")) { string empName = (string) employees["111-22-3333"]; Console.WriteLine("Employee 111-22-3333's name is: " + empName); } else Console.WriteLine("Employee 111-22-3333 is not in the hash table..."); } } Data Structures

  31. Hashtable • Key = any type • Key is transformed into an index via GetHashCode() function • Object class defines GetHashCode() • H(key) = [GetHash(key) + 1 + (((GetHash(key) >> 5) + 1) % (hashsize – 1))] % hashsize Values = 0 .. hashsize-1 Data Structures

  32. Collision Resolution (3 – cont’d) • Rehashing = double hashing • Set of hash functions: H1, H2, …, Hn • Hk(key) = [GetHash(key) + k * (1 + (((GetHash(key) >> 5) + 1) % (hashsize – 1)))] % hashsize • Hashsize must be PRIME Data Structures

  33. Hashtable • Load Factor = MAX ( # items / # slots) • Optimal: 0.72 • Expanding the hashtable: 2 steps: (costly) • Double # slots (crt prime  next prime which is about twice bigger) • Rehash • High LoadFactor  Dense Hashtable • Less space • More probes on collision (1/(1-LF)) • If LF = 0.72  expected # probes = 3.5  O(1) Data Structures

  34. Hashtable • Costly to expand • Set the size in constructor if size is known • Asymptotic running times: • Access: O(1) • Add, Remove: O(1) • Search: O(1) Data Structures

  35. System.Collections.Generic.Dictionary • Typesafe • Strongly typed KEYS + VALUES • Operations: • Add(key, value) • ContainsKey(key) • Collision Resolution: CHAINING • Uses linked lists from an entry where collision occurs Data Structures

  36. Chaining in Dictionary Data Type Data Structures

  37. Dictionary Example Dictionary<keyType, valueType> variableName = new Dictionary<keyType, valueType>(); Dictionary<int, Employee> employeeData = new Dictionary<int, Employee>(); // Add some employees employeeData.Add(455110189) = new Employee("Scott Mitchell"); employeeData.Add(455110191) = new Employee("Jisun Lee"); ... // See if employee with SSN 123-45-6789 works here if (employeeData.ContainsKey(123456789)) ... Data Structures

  38. Chaining in the Dictionary type • Efficiency: • Add: O(1) • Remove: O (n/m) • Search: O(n/m) Where: n = hash table size m = number of buckets/slots • Implemented s.t. n=m at ALL times • The total # of chained elements can never exceed the number of buckets Data Structures

  39. Trees • = set of linked nodes where no cycle exists • (GT) a connected acyclic graph • Nodes: • Root • Leaf • Internal • |E| = ? • Forrest = { trees } Data Structures

  40. Popular Tree-Type Data Structures • BST: Binary Search Tree • Heap • Self-balancing binary search trees • AVL • Red-black • Radix tree • … Data Structures

  41. Binary Trees • Code example for defining a tree data object • Tree Traversal • In-order: L Ro R • Pre-order: Ro L R • Post-order: L R Ro • Ө(n) Data Structures

  42. Binary Tree Data Structure Data Structures

  43. Tree Operations • Search: Recursive: O(h) • h = height of the tree • Max & Min Search: search right/left • Successor & Predecessor Search • Insertion (easy: always add a new leaf) & Deletion (more complicated as it may cause the tree structure to change) • Running time: • function of the tree topology Data Structures

  44. Binary Search Tree • Improves the search time (and lookup time) over the binary tree in general • BST property: • for any node n, every descendant node's value in the left subtree of n is less than the value of n, and every descendant node's value in the right subtree is greater than the value of n Data Structures

  45. Non-BST vs BST • Non-BST • BST Data Structures

  46. Linear Search Time in BST The search time for a BST depends upon its topology. Data Structures

  47. BST continued • Perfectly balanced BST: • Search: O(log n) [ height = log n] • Sub-linear search running time • Balanced Binary Tree: • Exhibits a good ration: breadth/width • Self-balancing trees Data Structures

  48. The Heap • Specialized tree-based data structure that satisfies the heap property: if B is a child node of A, then key(A) ≥ key(B). [max-heap] • Operations: • delete-max or delete-min: removing the root node of a max- or min-heap, respectively • increase-key or decrease-key: updating a key within a max- or min-heap, respectively • insert: adding a new key to the heap • merge: joining two heaps to form a valid new heap containing all the elements of both Data Structures

  49. Max Heap Example Example of max-heap: Data Structures

  50. Linked Lists • No resizing necessary • Search: O(n) • Insertion • O(1) if unsorted • O(n) is sorted • Access: O(n) • System.Collections.Generic.LinkedList • Doubly-linked; type safe (value  Generics) • Element: LinkedListNode Data Structures

More Related