1 / 42

Lecture 04: Data Storage

Lecture 04: Data Storage. September 16, 2010 COMP 150-12 Topics in Visual Analytics. Lecture Outline. Examples Computer graphics Data structure vs. hardware rendering RDBMS 3 rd normal form. Data Storage and Retrieval Define interactivity Memory Data Representations and Structures

vail
Download Presentation

Lecture 04: Data Storage

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 04:Data Storage September 16, 2010 COMP 150-12Topics in Visual Analytics

  2. Lecture Outline Examples Computer graphics Data structure vs. hardware rendering RDBMS 3rd normal form • Data Storage and • Retrieval • Define interactivity • Memory • Data Representations and Structures • Storage vs. speed

  3. Assumption about Bottlenecks • In most visual analysis tools, the size of the data usually causes the most amount of delay. • This can occur in: • Data processing • Data retrieval • Data transformation • Etc. • It is also possible that rendering is the bottleneck… However, that’s often still related to the amount of data that needs to be rendered.

  4. Speed of Data Transfer 12.8 GB/s 16GB/s Ethernet 100Base-T 100Mb/s = 0.0125 GB/s SQL queries ~ 1000 /s SATA 0.5 GB/s Hard drive 0.06 GB/s

  5. Ideal Retrieval Time Jakob Nielson’s Alertbox: www.useit.com/alertbox • Powers of “Ten” • <= 0.1 second • < 1 second • < 10 seconds • < 1 minute • < 10 minutes • < 1 hour • < 10 hours • < 1 day • > 1 day Card, Robertson, and Mackinlay (1991). The information visualizer: An information workspace. ACM CHI'91 Conf.

  6. Ideal Retrieval Time Atkinson-Shiffrin memory model Image courtesy: http://www.dynamicflight.com/avcfibook/learning_process/

  7. Ideal Retrieval Time Correlates to “Sensory Memory”, which lasts for several tenths of a second. Also known as “the perceptual processing time constant” Movie frames are shot at 16 frames per second (fps) Retrieval + rendering time at 0.1 second = 10 fps (compare that with most 3D video games) Visual trace is generally retained in sensory memory for 0.25 second Implications for image comparison? • Powers of “Ten” • <= 0.1 second • < 1 second • < 10 seconds • < 1 minute • < 10 minutes • < 1 hour • < 10 hours • < 1 day • > 1 day

  8. Ideal Retrieval Time Sensory memory starts to decay Also known as “The immediate response time constraint” A person can make an unprepared response to some stimulus within about a second. Beyond that, they make a backchannel response to indicate interest (either listener or speaker) Limits of an animation sequence Much longer than that, the user gets bored • Powers of “Ten” • <= 0.1 second • < 1 second • < 10 seconds • < 1 minute • < 10 minutes • < 1 hour • < 10 hours • < 1 day • > 1 day

  9. Ideal Retrieval Time Sensory memory begins to transfer into short term memory (STM) If a sentence is spoken, sensory memory would be the sound, STM would hold the words. Also known as “The unit task time constant” The rough amount of time to complete a certain task. For example, pick up a mouse, move it to the menu, find the right element and click. Approximate limit for users keeping their attention on the task. Re-orientation is sometimes necessary. Only acceptable during natural breaks in the user’s work. • Powers of “Ten” • <= 0.1 second • < 1 second • < 10 seconds • < 1 minute • < 10 minutes • < 1 hour • < 10 hours • < 1 day • > 1 day

  10. Ideal Retrieval Time Most interactive systems need to stay in this range Beyond 10 seconds the user’s mind starts wandering and doesn’t retain enough information in STM. Flow of thoughts can be broken after 10 seconds. From a web experience perspective, a user will often leave the site. • Powers of “Ten” • <= 0.1 second • < 1 second • < 10 seconds • < 1 minute • < 10 minutes • < 1 hour • < 10 hours • < 1 day • > 1 day

  11. Ideal Retrieval Time Starts to push the limit of STM, or WORKING memory Retrieval time in this range will require a “progress bar” Certain automated computation will require minutes to complete. In this range, the user are sometimes still willing to wait for the results. Really need to justify the cost of time in the design process! • Powers of “Ten” • <= 0.1 second • < 1 second • < 10 seconds • < 1 minute • < 10 minutes • < 1 hour • < 10 hours • < 1 day • > 1 day

  12. Ideal Retrieval Time Transitions from STM to Long Term Memory (LTM) What memory is transferred from STM to LTM is not clear. LTM is subjected and related to mental models. Highly unreliable that the user will be able to “pick up where they left off”. Need to start considering computational “tricks” like computation, caching, pre-fetching, etc. • Powers of “Ten” • <= 0.1 second • < 1 second • < 10 seconds • < 1 minute • < 10 minutes • < 1 hour • < 10 hours • < 1 day • > 1 day

  13. Questions?

  14. Data Representations and Structures • How do we accomplish fast retrieval time? • Better data structure • Memory / storage vs. speed • Better system functionalities

  15. Better Data Structure • Consider the Data Cube (OLAP) vs. SQL example before • Same amount of data transfer • Representation and structure allows faster analysis of the same data from different perspectives

  16. Other Types of Data Structures • An overview of some existing data structure taxonomies: • 1996: Shneiderman • 1999: Card, Mackinglay, Shneiderman • 2004: Ware • 2005: Thomas, Cook • 2010: Ward, Grinstein, Keim

  17. Other Types of Data Structures • Ben Shneiderman (1996): • 1-D: documents, source code, sequential lists • 2-D: maps, floor plans, grids • 3-D: physical objects • N-D: multi-attribute data • Temporal: time-varying data • Tree/Hierarchy: file systems, org charts • Network: arbitrary relationships between objects, social networks Shneiderman. “The eyes have it: A task by data type taxonomy for information visualization”, Visual Language, 1996

  18. Other Types of Data Structures • Stu Card, Jock Mackinlay, Ben Shneiderman (1999): • Data Table: (see last lecture) • Spatial (Scientific) • Geographic • Documents • Time • Database • Hierarchies • Networks • World Wide Web Card, Mackinlay, Shneiderman. “Readings in Information Visualization: Using Vision To Think”, Morgan Kaufman, 1999

  19. Other Types of Data Structures • Colin Ware (2004): • Entities: objects of interests, people, hurricanes, a school of fish • Relationships: structures that relate entities, “part-of” (wheel is part of a car), structural and physical (components that make up a house), or conceptual (store and customers), causal (events that cause another), temporal (time lapse) Ware. “Information Visualization”, Morgan Kaufman, 2004

  20. Other Types of Data Structures • Jim Thomas and Kris Cook (2005): • Numeric Data: quantitative results from sensors • Language Data: human language • Image and Video: • Structural Characteristics • Loosely vs. highly structured: free text and image vs. transactions and RDBMS • Geospatial Characteristics • Temporal Characteristics Thomas and Cook. “Illuminating the Path”, IEEE, 2005

  21. Other Types of Data Structures • Matt Ward, Georges Grinstein, Daniel Keim (2010): • Scalars, vectors, and tensors: 1-n dimensions • Geometry and Grids: requiring coordinates • Temporal: time stamps • Topology: how data records are connected • MRI: density (scalar) + 3D grid • CFD: 3D grid + temporal + 3D vectors + topology • Financial: temporal + n-D tensor • CAD: 3D grid + topology • Remote Sensing: 2 or 3D grid + temporal + connectivity • Census: 2D grid + temporal + n-D tensor • Social Network: n-D tensor + connectivity + (temporal) + (2D grid) Ward, Grinstein, Keim. “Interactive Data Visualization”, AK Peters, 2010

  22. How To Model Your Problem?

  23. How To Model Your Problem? • As a network / topology What operators can we do on this?

  24. How To Model Your Problem? • As a table What operators can we do on this?

  25. How To Model Your Problem? What operators can we do on this? • As a set of 2D vectors

  26. How To Model Your Problem? What operators can we do on this? • As a geometry

  27. How To Model Your Problem? What operators can we do on this? • As an image

  28. How To Model Your Problem? • No single right way… It is heavily task dependent. • Key point is coming up with a problem isomorph that can transform a particular problem into an existing, efficient data structure (which is not always obvious) • Example: Google

  29. Questions?

  30. Can I Have a Flexible Structure? • An age old question – why don’t we have a structure that stores all these structures? • Answer: too expensive to store!

  31. An Example: Triangle Strip d i a c e g h b f Image Source: Wikipedia. “Triangle Strip”

  32. A General Structure Class Edge { Vertex* vertices [2]; Face* faces[2]; }; Class Face { Vertex* vertices[3]; Edge* edges[3]; }; float*3 = 12b ptr*(4) = 16b ptr*(3) = 12b Total: 40b * 6 = 240b ptr*2 = 8b ptr*2 = 8b Total: 16b * 9 = 144b ptr*3 = 12b ptr*3 = 12b Total: 24b * 4 = 96b Class Vertex { float position [3]; Edge* list_of_edges [n]; Face* list_of_faces [m]; };

  33. A General Structure Class Edge { Vertex* vertices [2]; Face* faces[2]; }; Class Face { Vertex* vertices[3]; Edge* edges[3]; }; Grand Total: 480 bytes float*3 = 12b ptr*(4) = 16b ptr*(3) = 12b Total: 40b * 6 = 240b ptr*2 = 8b ptr*2 = 8b Total: 16b * 9 = 144b ptr*3 = 12b ptr*3 = 12b Total: 24b * 4 = 96b Class Vertex { float position [3]; Edge* list_of_edges [n]; Face* list_of_faces [m]; };

  34. A Task-Specific Structure • 6 vertices * 3 floats each = 6 * 3 * float = 72b • glBegin(GL_TRIANGLE_STRIP); • glVertex3f( A.x, A.y, A.z); //vertex 1 glVertex3f( B.x, B.y, B.z); //vertex 2glVertex3f( C.x, C.y, C.z); //vertex 3glVertex3f( D.x, D.y, D.z); //vertex 4glVertex3f( E.x, E.y, E.z); //vertex 5glVertex3f( F.x, F.y, F.z); //vertex 6 • glEnd(); • Difference of: 480b / 72b = 6.66666

  35. Questions?

  36. Memory / Storage vs. Speed • Theoretical problem in computer science • Typically, faster speed means more memory and storage • For example, sorting:

  37. Image Source: Wikipedia. “Sorting Algorithm”

  38. Memory / Storage vs. Speed • Notice that with additional use of memory, the algorithm is either faster, or has additional properties that might be desirable (such as stability) • For your assignment 1, notice the same thing: • Fastest retrieval is to duplicate all elements from parent to child, but memory consumption is non-trivial • More memory efficient algorithms would require (recursively) looking to the parent node for information

  39. Problems with Duplicating Data • Imagine an update to the root node that needs to be propagated. • Others? • Example: in databases, maintaining 3rd normal form

  40. 2NF vs. 3NF Image Source: Wikipedia. “Third Normal Form”

  41. Comparison • What are the advantages of 2nd normal form? • What are the advantages of 3rd normal form? • Can we go further? • Should we go further?

  42. Questions?

More Related