CSE 326: Data Structures Part 10 Advanced Data Structures

CSE 326: Data StructuresPart 10Advanced Data Structures Henry Kautz Autumn Quarter 2002

Outline • Multidimensional search trees • Range Queries • k-D Trees • Quad Trees • Randomized Data Structures & Algorithms • Treaps • Primality testing • Local search for NP-complete problems

5,2 2,5 8,4 4,2 3,6 9,1 4,4 1,9 8,2 5,7 Multi-D Search ADT • Dictionary operations • create • destroy • find • insert • delete • range queries • Each item has k keys for a k-dimensional search tree • Searches can be performed on one, some, or all the keys or on ranges of the keys

Applications of Multi-D Search • Astronomy (simulation of galaxies) - 3 dimensions • Protein folding in molecular biology - 3 dimensions • Lossy data compression - 4 to 64 dimensions • Image processing - 2 dimensions • Graphics - 2 or 3 dimensions • Animation - 3 to 4 dimensions • Geographical databases - 2 or 3 dimensions • Web searching - 200 or more dimensions

Range Query A range query is a search in a dictionary in which the exact key may not be entirely specified. Range queries are the primary interface with multi-D data structures.

Search for items based on just one key Search for items based on ranges for all keys Search for items based on a function of several keys: e.g., a circular range query Range Query Examples:Two Dimensions

Range Querying in 1-D Find everything in the rectangle… x

Range Querying in 1-D with a BST Find everything in the rectangle… x

1-D Range Querying in 2-D y x

2-D Range Querying in 2-D y x

keys value left right k-D Trees • Split on the next dimension at each succeeding level • If building in batch, choose the median along the current dimension at each level • guarantees logarithmic height and balanced tree • In general, add as in a BST k-D tree node The dimension that this node splits on dimension

Find in a k-D Tree find(<x1,x2, …, xk>, root) finds the node which has the given set of keys in it or returns null if there is no such node Node find(keyVector keys, Node root) { int dim = root.dimension; if (root == NULL) return NULL; else if (root.keys == keys) return root; else if (keys[dim] < root.keys[dim]) return find(keys, root.left); else return find(keys, root.right); } runtime:

Find Example 5,2 find(<3,6>) find(<0,10>) 2,5 8,4 4,4 1,9 8,2 5,7 4,2 3,6 9,1

y Building a 2-D Tree (1/4) x

Building a 2-D Tree (2/4) y x

g h f k d l b a j c i m k-D Tree a d b c e e f h i g j m k l

y x 2-D Range Querying in 2-D Trees Search every partition that intersects the rectangle. Check whether each node (including leaves) falls into the range.

Range Query in a 2-D Tree print_range(int xlow, xhigh, ylow, yhigh, Node root) { if (root == NULL) return; if ( xlow <= root.x && root.x <= xhigh && ylow <= root.y && root.y <= yhigh ){ print(root); if ((root.dim == “x” && xlow <= root.x ) || (root.dim == “y” && ylow <= root.y )) print_range(root.left); if ((root.dim == “x” && root.x <= xhigh) || (root.dim == “y” && root.y <= yhigh) print_range(root.right); } runtime: O(N)

Range Query in a k-D Tree print_range(int low[MAXD], high[MAXD], Node root) { if (root == NULL) return; inrange = true; for (i=0; i<MAXD;i++){ if ( root.coord[i] < low[i] ) inrange = false; if ( high[i] < root.coord[i] ) inrange = false; } if (inrange) print(root); if ((low[root.dim] <= root.coord[root.dim] ) print_range(root.left); if (root.coord[root.dim] <= high[root.dim]) print_range(root.right); } runtime: O(N)

Other Shapes for Range Querying y x Search every partition that intersects the shape (circle). Check whether each node (including leaves) falls into the shape.

insert(<5,0>) insert(<6,9>) insert(<9,3>) insert(<6,5>) insert(<7,7>) insert(<8,6>) k-D Trees Can Be Inefficient(but not when built in batch!) 5,0 6,9 9,3 6,5 7,7 8,6 suck factor:

insert(<5,0>) insert(<6,9>) insert(<9,3>) insert(<6,5>) insert(<7,7>) insert(<8,6>) k-D Trees Can Be Inefficient(but not when built in batch!) 5,0 6,9 9,3 6,5 7,7 8,6 suck factor: O(n)

x keys value y Quad Trees • Split on all (two) dimensions at each level • Split key space into equal size partitions (quadrants) • Add a new node by adding to a leaf, and, if the leaf is already occupied, split until only one node per leaf quad tree node quadrant 0,1 1,1 Center: 0,0 1,0 0,0 1,0 0,1 1,1 Quadrants: Center

Find in a Quad Tree find(<x, y>, root) finds the node which has the given pair of keys in it or returns quadrant where the point should be if there is no such node Node find(Key x, Key y, Node root) { if (root == NULL) return NULL; // Empty tree if (root.isLeaf()) return root; // Key may not actually be here int quad = getQuadrant(x, y, root); return find(x, y, root.quadrants[quad]); } Compares against center; always makes the same choice on ties. runtime: O(depth)

a b c d e g f Find Example find(<10,2>) (i.e., c) find(<5,6>) (i.e., d) a g d e f b c

Building a Quad Tree (1/5) y x

a b c d e g f Quad Tree Example a g d e f b c

Quad Trees Can Suck a b suck factor:

Quad Trees Can Suck a b suck factor: O(log (1/minimum distance between nodes))

2-D Range Querying in Quad Trees y x

2-D Range Query in a Quad Tree print_range(int xlow, xhigh, ylow, yhigh, Node root){ if (root == NULL) return; if ( xlow <= root.x && root.x <= xhigh && ylow <= root.y && root.y <= yhigh ){ print(root); if (xlow <= root.x && ylow <= root.y) print_range(root.lower_left); if (xlow <= root.x && root.y <= yhigh) print_range(root.upper_left); if (root.x <= x.high && ylow <= root.x) print_range(root.lower_right); if (root.x <= xhigh && root.y <= yhigh) print_range(root.upper_right); } runtime: O(N)

Find in a Quad Tree find(<x, y>, root) finds the node which has the given pair of keys in it or returns quadrant where the point should be if there is no such node Node find(Key x, Key y, Node root) { if (root == NULL) return NULL; // Empty tree if (root.isLeaf()) return root; // Key may not actually be here int quad = getQuadrant(x, y, root); return find(x, y, root.quadrants[quad]); } Compares against center; always makes the same choice on ties. runtime: O(depth)

Delete Example delete(<10,2>)(i.e., c) a b c a g d d e f e g f • Find and delete the node. • If its parent has just one child, delete it. • Propagate! b c

Nearest Neighbor Search getNearestNeighbor(<1,4>) a b c a g d e d e f g f • Find a nearby node (do a find). • Do a circular range query. • As you get results, tighten the circle. • Continue until no closer node in query. b c Works on k-D Trees, too!

Quad Trees vs. k-D Trees • k-D Trees • Density balanced trees • Number of nodes is O(n) where n is the number of points • Height of the tree is O(log n) with batch insertion • Supports insert, find, nearest neighbor, range queries • Quad Trees • Number of nodes is O(n(1+ log(/n))) where n is the number of points and  is the ratio of the width (or height) of the key space and the smallest distance between two points • Height of the tree is O(log n + log ) • Supports insert, delete, find, nearest neighbor, range queries

To Do • Read (a little) about k-D trees in Weiss 12.6

CSE 326: Data StructuresPart 10, continued Data Structures Randomized Henry Kautz Autumn Quarter 2002

Pick a Card Warning! The Queen of Spades is a very unlucky card!

Randomized Data Structures • We’ve seen many data structures with good average case performance on random inputs, but bad behavior on particular inputs • Binary Search Trees • Instead of randomizing the input (since we cannot!), consider randomizing the data structure • No bad inputs, just unlucky random numbers • Expected case good behavior on any input

What’s the Difference? • Deterministic with good average time • If your application happens to always use the “bad” case, you are in big trouble! • Randomized with good expected time • Once in a while you will have an expensive operation, but no inputs can make this happen all the time • Kind of like an insurance policy for your algorithm!

Treap Dictionary Data Structure heap in yellow; search tree in blue • Treaps have the binary search tree • binary tree property • search tree property • Treaps also have the heap-order property! • randomly assigned priorities 2 9 6 7 4 18 7 8 9 15 10 30 Legend: priority key 15 12

Treap Insert • Choose a random priority • Insert as in normal BST • Rotate up until heap order is restored (maintaining BST property while rotating) insert(15) 2 9 2 9 2 9 6 7 14 12 6 7 14 12 6 7 9 15 7 8 7 8 9 15 7 8 14 12

insert(7) insert(8) insert(9) insert(12) 6 7 6 7 2 9 2 9 7 8 6 7 6 7 15 12 7 8 7 8 Tree + Heap… Why Bother? Insert data in sorted order into a treap; what shape tree comes out? Legend: priority key

Treap Delete delete(9) 2 9 6 7 rotate left rotate left • Find the key • Increase its value to  • Rotate it to the fringe • Snip it off 6 7 9 15  9 7 8 15 12 7 8 9 15 6 7 rotate right 15 12 7 8  9 9 15 15 12

CSE 326: Data Structures Part 10 Advanced Data Structures

CSE 326: Data Structures Part 10 Advanced Data Structures

Presentation Transcript

Data Structures and Algorithms

GIS Data Models III

Data Structures

Goals of this Course

Advanced Database Systems

Data Management: Databases and Organizations Richard Watson

Data Models

Chapter 22 – Data Structures and Collections

Data Structures and Algorithms

Data Management: Databases and Organizations Richard Watson

Data Structures for 3D Searching

Chapter 8 Arrays

CS 61b: Final Review

Chapter 15 An Introduction to Data Structures

Data Structures

ALGOL-60 GENERALITY AND HIERARCHY

MMDSS 2007 Data stream management and mining

DATA STRUCTURES ( C++ )

ALBANIAN TOURISM OFFER

C++ Programming: Program Design Including Data Structures, Fourth Edition