1 / 40

Data Organization

Data Organization. Lots of Data Quickly Found. Image courtesy: http://www.flickr.com/photos/juhansonin/4734829999/sizes/o/in/photostream//. Organization. Are you an organized person? How much time would it take you to find your keys? a file on your computer? a phone number?

whitby
Download Presentation

Data Organization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Organization Lots of Data Quickly Found Image courtesy: http://www.flickr.com/photos/juhansonin/4734829999/sizes/o/in/photostream//

  2. Organization • Are you an organized person? How much time would it take you to find • your keys? • a file on your computer? • a phone number? • your homework paper? • Computers must keep track of literally trillions of items. Organization is essential for finding and processing this data.

  3. Naming Did I ever tell you that Mrs. McCave Had twenty-three sons and she named them all Dave? Well, she did. And that wasn't a smart thing to do. You see, when she wants one and calls out, "Yoo-Hoo! Come into the house, Dave!" she doesn't get one. All twenty-three Dave's of hers come on the run! This makes things quite difficult at the McCave's' • Theodor Geisel (Dr. Seuss) wrote over 45 children's books. • Too many Daves:

  4. Naming • Unique: a name must refer to exactly one thing; never more. 23 Daves! • One item should not have two names. Dr. Seuss or Theodore Geisel? • Descriptive: The name should describe it’s purpose or nature. Which is better? • zqiy.qcl • The Star Spangled Banner.mp3 • The name should be related to the location of the data if possible.

  5. Case Study in Naming : URL • A URL is unique: referring to exactly one web site; never two. • One web site rarely has two URLs. • Most URL’s are descriptive. • http://en.wikipedia.org/wiki/Coral_snake • Location: the above URL describes where to find the web page on the server

  6. Lists • A list is a sequence of items. Order matters. • Example: Most expensive paintings • The Card Players by Paul Cezanne • No. 5, 1948 by Jackson Pollock • Woman III by Willem de Kooning • Portrait of Adele Bloch-Bauer I by Gustav Klimt • Portrait of Dr. Gachetby Vincent van Gogh • Any item can be identified by it’s position • Woman III is the 3rd item in the list • When we identify an item by it’s position, we are using indexing • Indexing associates a unique number with an item in a list. • The index is unique, refers to exactly one item, and is related to the location of the item.

  7. Lists • Conventions: • If X is a list, we will denote the ith item as X[i]. • Most computing systems use i=0 to denote the first item in the list. • Assume that the previous list is named Paintings. • Paintings[2] refers to Woman III • Paintings[0] refers to The Card Players

  8. Storage • Computer memory is linear • Memory is a one-dimensional arrangement of storage units • Each storage unit is numbered with an address (or index) • Each storage unit can hold one item of data • Might consider memory to be a list of storage units.

  9. Arrays • An array is the simplest way to store a list. • A section of memory is used to store the list items sequentially in memory

  10. Arrays • Can we store more than one array in memory? • The name is an anchor • The name is a memory location • Indexing through the name is an offset • Arrays cannot resize • How to add to the paintings list?

  11. Array Retrieval • Item retrieval • A[i] means : get the item at memory location (A + i) • Item deletion • Add an item as the most expensive. Erase the list. Re-write the new list.

  12. Array Deletion • Delete item i from array A • Move A[i+1] to A[i] • Move A[i+2] to A[i+1] • etc… • How efficient is this? • How many items must move?

  13. Array Insertion • Insert an item at index 0 of A • Move A[0] to A[1] • Move A[1] to A[2] • Move A[2] to A[3] • etc.. • How efficient is this? • How many items move?

  14. Arrays 1 2 3 4 5 6 7 8 • Advantages • direct (fast) access to data • efficiently uses available memory • Disadvantages • requires an index to access data • size of the data is fixed • adding/removing from the middle is 'hard'

  15. GeoCache Metaphor • GeoCaching races: • start with a GPS coordinate • Go to the location • Find the cache (treasure chest) • Find an ‘item’ • Find the next GPS coordinate • Find the 'next' cache with the new coordinate • The first location allows you to access all items in the race

  16. Geo Caching

  17. Linked Lists • Lists can be linked together in memory • A ‘node’ (analogous to the treasure chest) is a pair of adjacent memory locations • The 1st part of the node is the item • The 2nd part of the node is the memory location of the next node • If the next memory location is zero, you are at the end.

  18. Linked List

  19. Links • Consider storing a list of numbers in memory • What list does the array contain? • Assume that the anchor is "104" • Assume that the 'end link' is zero • What if the value at location 109 were set to 104? • What if the value at location 105 were set to 0?

  20. Linked List Retrieval • Given an index i, how to find the ith item in the list? • Must chain through i-1 items • Deleting an item • Once we find the item to delete we change the value held in one memory location. Which one? • Adding an item • Find an empty pair of memory locations and create a node • Insert the item to store into the first part of the node • Insert the memory location of the next thing into the second part of the node • Change the 2nd part of the previous node to reference the newly created node.

  21. Inserting into Linked List

  22. Graphs • A graph is a mathematical abstraction • Node • sometimes called vertices • an item in the graph • Arc • a directed connection between two nodes • written as (N1, N2) meaning from N1 to N2 • A graph is a set of nodes and a set of arcs • G = (V, E) • V is a set of nodes • E is a set of arcs

  23. Example • Example: • V = {A,B,C,D,E} • E = {(A,E), (A,B), (B,A), (B,D), (C,E), (D,C), (E,B), (E,C), (E,D)} • G = (V, E)

  24. Graphs Model the Real World

  25. Graphs Model the Real World

  26. Graphs Model the Real World • Graphs are truly ubiquitous in computational thought because they are able to capture the essence of a wide variety of real-world problems and their solutions. • Games:each node represents the board after a player has moved and each arc represents one players move. • Chemical structures : each node is associated with an atom and each arc represents a bond between atoms. • Electrical circuits : each node represents an electrical connection between two components and each arc represents an electrical component such as a resistor or capacitor. • The national power grid : each node represents a transformer and each arc represents a power line that connects transformers. • Computer networks (i.e. the Internet) : each node represents ? and each arc ? • A Universities curriculum : each node represents ? each arc ?

  27. Graphs : Definitions Adjacency: Assume that U and V are vertices in some graph. Vertex U is adjacent to vertex V if there is an arc (U, V) in the graph. Loop: any arc such that the first and second nodes of the arc are the same. In-degree. The in-degree of a vertex V is the number of arcs in the graph having V as the second vertex. Out-degree. The out-degree of a vertex V is the number of arcs in the graph having V as the first vertex. Order: the number of vertices. Size: the number of arcs. Path: A path is a sequence of vertices such that for every pair of adjacent vertices in the sequence there is a corresponding arc in the graph. Also, a sequence containing a single vertex is a path. Path Length: the number of arcs in the path. Cycle: A cycle is a path where the length is greater than zero and the first and last vertex are the same. A graph without any cycles is known as an acyclic graph.

  28. Graphs • The order? • The size? • Is A adjacent to E? • Is E adjacent to A? • Out-degree of A? • In-degree of A? • Is there a loop? • Is [A, E, C, E] a path? • Is [A, B, A] a path? • Is the graph acyclic?

  29. Storing Graphs in Memory • They can be stored using array-like techniques. We will discuss a linking strategy. • Similar to lists but each node may have an out-degree other than 1. A ‘node’ stores the information associated with a single vertex V. • The vertex contents • The out-degree (an integer number we’ll call N) • N addresses of the adjacent nodes.

  30. Draw the graph

  31. Trees not a tree not a tree a tree • A Tree is a type of graph that models hierarchical data. • Has exactly one node with in-degree zero. This node is referred to as the ‘root’. A tree may have no nodes at all; a situation that is an exception to the ‘one-node’ rule given here. • Every node other than the root has an in-degree of one • There is a path from the root to every other vertex

  32. Trees are drawn upside-down

  33. Examples : Org Chart

  34. Examples : Family Tree

  35. Examples : Taxonomy

  36. Examples : Linguistics

  37. Examples: File System

  38. Example • Consider storing the following tree in memory • We have 30 memory slots • Each slot can hold either a 'letter' or a 'number' • Each 'node' is formatted as • node value • number of children N • N links

  39. Example

  40. Example

More Related