Data Structures Specification and Implementation

1 / 60

# Data Structures Specification and Implementation - PowerPoint PPT Presentation

CSE 5350/7350 Introduction to Algorithms. Data Structures Specification and Implementation. Textbook readings: Cormen: Part III, Chapters 10-14 Mihaela Iridon , Ph.D. mihaela@engr.smu.edu. Objectives. Understand what dynamic sets are Learn basic techniques for Representing &

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Data Structures Specification and Implementation' - rangle

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

CSE 5350/7350

Introduction to Algorithms

### Data StructuresSpecification and Implementation

Cormen: Part III, Chapters 10-14

Mihaela Iridon, Ph.D.

mihaela@engr.smu.edu

Data Structures

Objectives
• Understand what dynamic sets are
• Learn basic techniques for
• Representing &
• Manipulating finite dynamic set
• Elementary Data Structures
• Stacks, queues, heaps, linked lists
• More Complex Data Structures
• Hash tables, binary search trees
• Data Structures in C#.NET 2.0

Data Structures

High-Level Structure (1)
• Arrays
• System.Collections.ArrayList
• System.Collections.Generic.List
• Queue
• System.Collections.Generic.Queue
• Stack
• System.Collections.Generic.Stack

Data Structures

High-Level Structure (2)
• Hashtable
• System.Collections.Hashtable
• System.Collections.Generic.Dictionary
• Trees
• Binary Trees, BST, Self-Balancing BST
• Graphs

Data Structures

Dynamic Data Sets

Definition

Why dynamic

General examples

Data structures and the .NET framework

“An Extensive Examination of Data Structures Using C# 2.0” – Scott Mitchell

http://msdn2.microsoft.com/en-us/library/ms364091(VS.80).aspx

Data Structures

Data Structure Design

Impact on efficiency/running time

The data structure used by an algorithm can greatly affect the algorithm's performance

Important to have rigorous method by which to compare the efficiency of various data structures

Data Structures

Example: file extension search

public bool DoesExtensionExist(string [] fileNames, string extension)

{

int i = 0;

for (i = 0; i < fileNames.Length; i++)

if (String.Compare(Path.GetExtension(fileNames[i]), extension, true) == 0)

return true;

return false; // If we reach here, we didn't find the extension }

}

Search is of O(n)

Data Structures

The Array

Linear

Simple

Direct Access

Homogeneous

Most widely used

Data Structures

The Array (2)

The contents of an array are stored in contiguous memory.

All of the elements of an array must be of the same type or of a derived type; hence arrays are referred to as homogeneous data structures.

Array elements can be directly accessed. With arrays if you know you want to access the ith element, you can simply use one line of code: arrayName[i].

Data Structures

Array Operations
• Allocation
• Accessing
• Declaring an array in C#:

string[] myArray;

(initially myArray reference is null)

• Creating an array in C#:

myArray = new string[5];

Data Structures

Array Allocation
• string[] myArray = new string[someIntegerSize];
•  this allocates a contiguous block of memory on the heap (CLR-managed)

Data Structures

Array Accessing
• Accessing an element at index i: O(1)
• Searching through and array
• Unsorted: O(n)
• Sorted: O(log n)
• Array class: static method:
• Array.BinarySearch(Array input, object val)

Data Structures

Array Resizing
• When the size needs to change:
• Must create a new array instance
• Copy old array into new array:

Array1.CopyTo(Array2, 0)

• Time consuming
• Also, inserting into an array is problematic

Data Structures

Multi-Dimensional Arrays
• Rectangular
• n x n
• n x n x n x …
• Accessing: O(1)
• Searching: O(nk)
• Jagged/Ragged
• n1 x n2 x n3 x …

Data Structures

Goals

Type-safe

Performant

Reusable

Example: payroll application

Data Structures

System.Collections.ArrayList

Can hold any data type: (hybrid)

Internally: array object

Automatic resizing

Not type safe: casting  errors detected only at runtime

Boxing/unboxing: extra-level of indirection  affects performance

Loose homogeneity

Data Structures

Generics
• Remedy for Typing and Performance
• Type-safe collections
• Reusability
• Example:

public class MyTypeSafeList<T>

{

T[] innerArray = new T[0];

}

Data Structures

List
• Homogeneous
• Self-Re-dimensioning Array
• System.Collections.Generic.List

List<string> studentNames = new List<string>();

string name = studentNames[3];

studentNames[2] = “Mike”;

Data Structures

List Methods
• Contains()
• IndexOf()
• BinarySearch()
• Find()
• FindAll()
• Sort()
• Asymptotic Running Time: same as array but with extra overhead

Data Structures

Ordered Requests Processing

First-come, First-serve (FIFO)

Priority-based processing

Inefficient to use List<T>

List will continue to grow (internally, the size is doubled every time)

Solution: circular list/array

Problem: initial size??

Data Structures

Queue
• System.Collections.Generic.Queue
• Operations:
• Enqueue()
• Dequeue()
• Contains()
• ToArray()
• Peek()
• Does not allow random access
• Type-safe; maximizes space utilization

Data Structures

Queue (continued)
• Applications:
• Web servers
• Print queues
• Rate of growth:
• Specified in the constructor
• Default: double initial size

Data Structures

Stack
• LIFO
• System.Collections.Generic.Stack
• Operations:
• Push()
• Pop()
• Doubles in size when more space is needed
• Applications:
• CLR call stack (functions invocation)

Data Structures

Limitations of Ordinal Indexing
• Ideal access time: O(1)
• If index is unknown
• O(n) if not sorted
• O(log n) if sorted
• Example: SSN: 10 ^ 9 possible combinations
• Solution: compress the ordinal indexing domain with a hash function; e.g. use only 4 digits

Data Structures

Hash Table
• Hashing:
• Math transformation of one representation into another representation
• Hash table:
• The array that uses hashing to compress the indexers space
• Cryptography (information security)
• Hash function:
• Non-injective (not a one-to-one function)
• “Fingerprint” of initial data

Data Structures

Goals
• Fast access of items in large amounts of data
• Few collisions as possible
• collision avoidance
• Avalanche effect:
• Minor changes to input  major changes to output

Data Structures

Collision Resolution (1)
• Probability to map to a given location:

1/k (k = size = number of slots)

• (1) Linear Probing

Is H[i] empty?

• YES: place item at location I
• NO: i = i + 1; repeat
• Deficiency: clustering
• Access and Insertion: no longer O(1)

Data Structures

Collision Resolution (2)
• Check s + 12
• Check s – 12
• Check s + 22
• Check s – 22
• Check s +/- i2
• Clustering a problem as well

Data Structures

Collision Resolution (3)
• (3) Rehashing – used by Hashtable (C#)
• System.Collections.Hashtable
• Operations:
• ContainsKey()
• Keys()
• ContainsValue()
• Values()
• Key, Value: any type  not type safe

Data Structures

Hashtable Data Type – Example

using System;

using System.Collections;

public class HashtableDemo

{

private static Hashtable employees = new Hashtable();

public static void Main()

{

// Add some values to the Hashtable, indexed by a string key

// Access a particular key

if (employees.ContainsKey("111-22-3333"))

{

string empName = (string) employees["111-22-3333"];

Console.WriteLine("Employee 111-22-3333's name is: " + empName);

}

else

Console.WriteLine("Employee 111-22-3333 is not in the hash table...");

}

}

Data Structures

Hashtable
• Key = any type
• Key is transformed into an index via GetHashCode() function
• Object class defines GetHashCode()
• H(key) = [GetHash(key) + 1 +

(((GetHash(key) >> 5) + 1) %

(hashsize – 1))] % hashsize

Values = 0 .. hashsize-1

Data Structures

Collision Resolution (3 – cont’d)
• Rehashing = double hashing
• Set of hash functions: H1, H2, …, Hn
• Hk(key) = [GetHash(key) + k *

(1 + (((GetHash(key) >> 5) + 1) %

(hashsize – 1)))] % hashsize

• Hashsize must be PRIME

Data Structures

Hashtable
• Load Factor = MAX ( # items / # slots)
• Optimal: 0.72
• Expanding the hashtable: 2 steps: (costly)
• Double # slots (crt prime  next prime which is about twice bigger)
• Rehash
• High LoadFactor  Dense Hashtable
• Less space
• More probes on collision (1/(1-LF))
• If LF = 0.72  expected # probes = 3.5  O(1)

Data Structures

Hashtable
• Costly to expand
• Set the size in constructor if size is known
• Asymptotic running times:
• Access: O(1)
• Search: O(1)

Data Structures

System.Collections.Generic.Dictionary
• Typesafe
• Strongly typed KEYS + VALUES
• Operations:
• ContainsKey(key)
• Collision Resolution: CHAINING
• Uses linked lists from an entry where collision occurs

Data Structures

Dictionary Example

Dictionary<keyType, valueType> variableName =

new Dictionary<keyType, valueType>();

Dictionary<int, Employee> employeeData = new Dictionary<int, Employee>();

...

// See if employee with SSN 123-45-6789 works here

if (employeeData.ContainsKey(123456789))

...

Data Structures

Chaining in the Dictionary type
• Efficiency:
• Remove: O (n/m)
• Search: O(n/m)

Where:

n = hash table size

m = number of buckets/slots

• Implemented s.t. n=m at ALL times
• The total # of chained elements can never exceed the number of buckets

Data Structures

Trees
• = set of linked nodes where no cycle exists
• (GT) a connected acyclic graph
• Nodes:
• Root
• Leaf
• Internal
• |E| = ?
• Forrest = { trees }

Data Structures

Popular Tree-Type Data Structures
• BST: Binary Search Tree
• Heap
• Self-balancing binary search trees
• AVL
• Red-black

Data Structures

Binary Trees
• Code example for defining a tree data object
• Tree Traversal
• In-order: L Ro R
• Pre-order: Ro L R
• Post-order: L R Ro
• Ө(n)

Data Structures

Tree Operations
• Search: Recursive: O(h)
• h = height of the tree
• Max & Min Search: search right/left
• Successor & Predecessor Search
• Insertion (easy: always add a new leaf) & Deletion (more complicated as it may cause the tree structure to change)
• Running time:
• function of the tree topology

Data Structures

Binary Search Tree
• Improves the search time (and lookup time) over the binary tree in general
• BST property:
• for any node n, every descendant node's value in the left subtree of n is less than the value of n, and every descendant node's value in the right subtree is greater than the value of n

Data Structures

Non-BST vs BST
• Non-BST
• BST

Data Structures

Linear Search Time in BST

The search time for a BST depends upon its topology.

Data Structures

BST continued
• Perfectly balanced BST:
• Search: O(log n) [ height = log n]
• Sub-linear search running time
• Balanced Binary Tree:
• Exhibits a good ration: breadth/width
• Self-balancing trees

Data Structures

The Heap
• Specialized tree-based data structure that satisfies the heap property: if B is a child node of A, then key(A) ≥ key(B). [max-heap]
• Operations:
• delete-max or delete-min: removing the root node of a max- or min-heap, respectively
• increase-key or decrease-key: updating a key within a max- or min-heap, respectively
• insert: adding a new key to the heap
• merge: joining two heaps to form a valid new heap containing all the elements of both

Data Structures

Max Heap Example

Example of max-heap:

Data Structures

• No resizing necessary
• Search: O(n)
• Insertion
• O(1) if unsorted
• O(n) is sorted
• Access: O(n)
• Doubly-linked; type safe (value  Generics)

Data Structures

Skip List

Link list with self-balancing BST-like property

The elements are sorted

Height = log n

Problems with insert & delete

Solution: randomized distribution

Overall: O(log n)

Worst case: O(n) – but very, very, slim changes to reach worst case

Data Structures

Skip List Examples

Data Structures

Graphs
• A collection of interconnected nodes
• A graph or undirected graphG is an ordered pair G: = (V,E) that is subject to the following conditions:
• V is a set, whose elements are called vertices or nodes,
• E is a set of pairs (unordered) of distinct vertices, called edges or lines.
• Edges (1):
• Directed - Weighted
• Undirected - Unweighted

Data Structures

Graph (cont’d)
• Sparse: |E| << |Emax| or |E| ≤ n2
• Representation:
• (Packed Edge List)
• Problems applicable to graphs:
• Minimum spanning tree (Kruskal, Prim)
• Shortest Path (Dijkstra)

Data Structures

Distance Graph Example

Data Structures

Graph Representation

Data Structures

Minimum Spanning Tree

Spanning Tree of a connected, undirected graph = some subset of the edges that connect all the nodes, and does not introduce a cycle

Data Structures

Kruskal’s Algorithm

Data Structures

Prim’s Algorithm

Data Structures