Data Structures & Algorithms Overview: Recursion, Lists, Trees, Hashing, GIS

CS 315 Lec 2, Jan 29 • Goals: • Finish the course overview • Introduction to recursion • Reading for the week: • Chapter 1: sections 1.1, 1.2 and 1.3

Images stored in 2-dim arrays • We will work on 2-d arrays by manipulating images: • Each pixel is represented by a blue value, a red value and a green value (any color is a combination of these colors). (255, 255, 255) represents white, (255, 0, 0) represents red etc. • pic(i , j)-> Blue represents the blue component of the i-th row, j-th column pixel of pic and so on. • Some basic operations on images: • open, read, write • rotate, copy a sub-image • filter (remove blemishes) • extract features (identify where buildings are in an aerial photograph)

Linked lists order is important • Linked lists: • Storing a sequence of items in non-consecutive locations of the memory. • Not easy to search for a key (even if sorted). • Inserting next to a given item is easy. • In doubly linked list, inserting before or after a given item is easy. • Don’t need to know the number of items in advance. (dynamic)

stacks and queues • stacks: • insert and delete at the same end. • equivalently, last element inserted will be the first one to be deleted. • very useful to solve many problems • Processing arithmetic expressions • queues: • insert at one end, deletion at the other end. • equivalently, first element inserted is the first one to be deleted.

Non-linear data structures • Trees • Binary search trees, expression tree • Quad-tree Lptr key Rptr 15 Main purpose of a binary search tree  supports dictionary operations efficiently

Priority queue Max priority key is the one that gets deleted next. Equivalently, support for the following operations: insert deleteMin Useful in solving many problems fast sorting (heap-sorting) shortest-path, minimum spanning tree, scheduling etc.

Hashing • Supports dictionary operations very efficiently (most of the time). • Main advantages: • simple design, easy to implement • on average very fast • not good in the worst-case

What data structure to use? Example 1: There are many billions of web pages accessible to a search engine. When you type on the google search page something like: • you get instantaneous response. What kind of data structure is used here? • The details are quite complicated, but the main data structure used is quite simple.

Data structure used - inverted index Array of lists – each array entry contains a word and a pointer to all the web pages that contain that word: 38 97 297 145 Data structure 876 Question: How do we access the array index from key word?Hashing is used.

Example 2: The entire landscape of the world is being digitized (there is a whole new branch that combines information technology and geography called GIS – Geographic Information System). What kind of data structure should be used to store all this information? Snapshot from Google earth

Some general issues related to GIS • How much memory do we need? Can this be stored in one computer? • Building the database is done in the background (off-line processing) • How fast can the queries be answered? • Response to query is called the on-line processing • Suppose each square mile is represented by a 1024 by 1024 pixel image, how much storage do we need to store the map of the United States?

Calculate the memory needed • Very rough estimate of the memory needed: • Area of USA is 4 x 106 sq miles (roughly) • Each square mile needs 106 pixels (roughly) • Each pixel requires 32 bits usually. • Thus the total memory needed • = 4 x 106 x 32 x 106 = 168 x 1012 = 168000 Giga bits • (A standard desk top has ~ 200 Giga bits of memory.) • Need about 800 such computers to store the data

What data structure to store the images? • each 1024 x 1024 image can be stored in a two-dimensional array. (standard way to store all kinds of images – bmp, jpg, png etc.) The actual images are stored in a secondary memory (hard disks on several servers either in a central location or distributed). • The number of images would be roughly 4 x 106. A set of pointers to these images can be stored in a 1 (or 2) dimensional array. • When you click on a point on the map, its index in the array is calculated. • From that index, the image is accessed and sent by a network to the requesting client.

Overview of the projects • Generate all the poker hands • More generally, given a set of N items and a number k<= N, generate all possible combinations or permutations of k items. • (concept: recursion, arrays, lists)

Overview of the projects 2) Image manipulation (concept: arrays, library, analysis of algorithm) (a) filtering After filtering

2b) Labeling an image

(2c) Recursive image generation:

3) Bounding box construction: OCR is one of the early success stories in software applications. Scan a printed page and recognize the characters in it. First step: bounding box construction.

4) Spelling checker: Given a text file T, identify all the misspelled words in T. Idea: build a hash table H of all the words in a dictionary, and search for each word of the text T in the table H. (hashing, string processing)

Last semester, peg-solitaire was the project that used hashing.

5) Image compression/decompression Run-length coding: 1111111111100000001110111111111 Can be coded as: 101121112112121001 Similar idea can be used in an image. If all the pixels of a subimage are the same, then we can store that subimage using a single pixel. Divide the image into quadrants and recursively apply this idea.

6) Geometric computation problem – given a set of rectangles, determine the total area covered by them. Also draw the contour. Concept: binary search tree.

Data Structures & Algorithms Overview: Recursion, Lists, Trees, Hashing, GIS