CS 315 Data Structures B. Ravikumar Office: 116 I Darwin Hall Phone: 664 3335 E-mail: firstname.lastname@example.org Course Web site: http://ravi.cs.sonoma.edu/cs315sp08
Textbook for the course: Data Structures and Algorithm Analysis in C++ by Mark Allen Weiss Lab: T 1 to 2:50 PM, Darwin Hall # 28
Course Goals • Learn to use fundamental data structures: • arrays, linked lists, stacks and queues • hash table • priority queue • binary trees • others • Enhance your skill in programming in c++ • recursion, classes, algorithm implementation • build projects using different data structures • Analytical and experimental analysis • quantitative reasoning about the performance of algorithms • comparing different data structures
Goals for today’s lecture • Course outline • Discuss Course work • lab assignments • projects • tests, final exam • If any time left, we will start discussion of recursion. (will continue it in the lab).
Data Structures – key to software design • Data structures play a key role in every type of software. • Data structure deals with how to store the data internally while solving a problem in order to • Optimize the overall running time of a program • Optimize the response time (for queries) • Optimize the memory requirements • Optimize other resources (e.g. network) • Simplify software design • make solution extendible, more robust
Abstract vs. concrete data structures • Abstract data structure (sometimes called ADT) is a collection of data with a set of operations supported to manipulate the structure • Examples: • stack, queue • priority queue • Dictionary • Concrete data structures are the implementations of abstract data structures: • Arrays, linked lists, trees, heaps, hash table • Main emphasis of the course: Find the best mapping between abstract and concrete data structures.
Abstract Data Structure (ADT) • container supporting operations • Dictionary • search • insert primary operations • Delete • deleteMin • Range search • Successor secondary operations • Merge • Priority queue • Insert • deleteMin • Merge, split etc. Secondary operations primary operations
Linear data structures • key properties of the (1-d) array: • a sequence of items are stored in consecutive memory locations. • array provides a constant time access to k-th element for any k. • (access the element by: Element[k].) • inserting at the end is easy. • if the current size is S, then we can add x at the end using the single instruction: • Element[S++] = x; • deleting at the end is also easy. • inserting or deleting at any other position is expensive. • Even searching is expensive (unless sorted).
Images are stored in 2-d arrays • We will work on 2-d arrays by manipulating images: • Each pixel is represented by a blue value, a red value and a green value (any color is a combination of these colors). (255, 255, 255) represents white, (255, 0, 0) represents red etc. • pic(i , j)-> Blue represents the blue component of the i-th row, j-th column pixel of pic and so on. • Some basic operations on images: • open, read, write • rotate, mirror image • filter (remove blemishes) • extract features (identify where buildings are in an aerial photograph)
Linked lists order is important • Linked lists: • Storing a sequence of items in non-consecutive locations of the memory. • Not easy to search for a key (even if sorted). • Inserting next to a given item is easy. • In doubly linked list, inserting before or after a given item is easy. • Don’t need to know the number of items in advance. (dynamic memory)
stacks and queues • stacks: • insert and delete at the same end. • equivalently, last element inserted will be the first one to be deleted. • very useful to solve many problems • Processing arithmetic expressions • queues: • insert at one end, deletion at the other end. • equivalently, first element inserted is the first one to be deleted.
Non-linear data structures • Various versions of trees • Binary search trees • Height-balanced trees etc. Lptr key Rptr 15 Main purpose of a binary search tree supports dictionary operations efficiently
Priority queue Max priority key is the one that gets deleted next. Equivalently, support for the following operations: insert deleteMin Useful in solving many problems fast sorting (heap-sorting) shortest-path, minimum spanning tree, scheduling etc.
Hashing • Supports dictionary operations very efficiently (most of the time). • Main advantages: • Simple to design, implement • on average very fast • not good in the worst-case.
What data structure to use? Example 1: There are more than 1 billion web pages. When you type on google something like: • You get instantaneous response. What kind of data structure is used here? • The details are quite complicated, but the main data structure used is quite simple.
Data structure used - inverted index Array of lists – each array entry contains a word and a pointer to all the web pages that contain that word: 38 97 297 145 Data structure 876 This list is kept sorted Question: How do we access the array index from key word?Hashing or some other clever data structure is needed.
Example 2: The entire landscape of the world is being digitized (there is a whole new branch that combines information technology and geography called GIS – Geographic Information System). What kind of data structure should be used to store all this information? Snapshot from Google earth
Some general issues related to GIS • How much memory do we need? Can this be stored in one computer? (or need a distributed data base?) • Building the database is done in the background (off-line processing) • How fast can the queries be answered? • Response to query is called the on-line processing • Suppose each square mile is represented by a 1024 by 1024 pixel image, how much storage do we need to store the terrain of United States?
Calculate the memory needed • Very rough estimate of the memory needed: • Area of USA is 4 x 106 sq miles (roughly) • Each square mile needs 106 pixels (roughly) • Each pixel requires 32 bits usually. • Thus the total memory needed • = 4 x 106 x 32 x 106 = 168 x 1012 = 168000 Giga bits • (A standard desk top has ~ 200 Giga bits of memory.) • Need about 800 such computers to store the data
What data structure to store the images? • each 1024 x 1024 image can be stored in a two-dimensional array. (standard way to store all kinds of images – bmp, jpg, png etc.) The actual images are stored in a secondary memory (hard disks on several servers either in a central location or distributed). • The number of images would be roughly 4 x 106. A set of pointers to these images can be stored in a 1 (or 2) dimensional array. • When you click on a point on the map, its index in the array is calculated. • From that index, the image is accessed and sent by a network to the requesting client.
Overview of the projects: • Generate all the poker hands • More generally, given a set of N items and a number k<= N, generate all possible combinations or permutations of k items. • (concept: recursion, arrays, lists) • 2) Image manipulation: (concept: arrays, library, analysis of algorithm) After filtering
3) Spelling checker: Given a text file T, identify all the misspelled words in T. Idea: build a hash table H of all the words in a dictionary, and search for each word of the text T in the table H. (concept: hashing, string processing) 4) Scheduling problem: A set of tasks should be assigned to several identical machines. What is the maximum number of jobs that can be assigned? Concept: priority queue (heap)
5) Geometric computation problem – given a set of rectangles, determine the total area covered by them. Also report all the intersections between them. Concept: binary search tree.