1 / 35

Lecture 03: Data Foundations

Lecture 03: Data Foundations. September 14, 2010 COMP 150-12 Topics in Visual Analytics. Lecture Outline. Data Operations Metadata Structure vs. Value Value Derived Value Derived Structure Structure. Data Foundations Basic Data Types Nominal Ordinal Scale / Quantitative Interval

nau
Download Presentation

Lecture 03: Data Foundations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 03:Data Foundations September 14, 2010 COMP 150-12Topics in Visual Analytics

  2. Lecture Outline Data Operations Metadata Structure vs. Value Value Derived Value Derived Structure Structure • Data Foundations • Basic Data Types • Nominal • Ordinal • Scale / Quantitative • Interval • ratio • Dimensionality • Scalars • Vectors • Matrices • Tensors

  3. Data Definition • A typical dataset in visualization consists of n records • (r1, r2, r3, … , rn) • Each record ri consists of m (m >=1) observations or variables • (v1, v2, v3, … , vm) • A variable may be either independent or dependent • Independent variable (iv) is not controlled or affected by another variable • For example, time in a time-series dataset • Dependent variable (dv) is affected by a variation in one or more associated independent variables • For example, temperature in a region • Formal definition: • ri = (iv1, iv2, iv3, … , ivmi, dv1, dv2, dv3, … , dvmd) • where m = mi + md

  4. Basic Data Types Def: A set of not-ordered and non-numeric values For example: Categorical (finite) data {apple, orange, pear} {red, green, blue} Arbitrary (infinite) data {“12 Main St. Boston MA”, “45 Wall St. New York NY”, …} {“John Smith”, “Jane Doe”, …} • Nominal • Ordinal • Scale / Quantitative • Interval • ratio

  5. Basic Data Types Def: A tuple (an ordered set) For example: Numeric <2, 4, 6, 8> Binary <0, 1> Non-numeric <G, PG, PG-13, R> • Nominal • Ordinal • Scale / Quantitative • Interval • ratio

  6. Basic Data Types Def: A numeric range Interval Ordered numeric elements on a scale that can be mathematically manipulated, but cannot be compared as ratios For example: date, current time (Sept 14, 2010 cannot be described as a ratio of Jan 1, 2011) Ratio where there exists an “absolute zero” For example: height, weight • Nominal • Ordinal • Scale / Quantitative • Interval • ratio

  7. Basic Data Types (Formal) • Nominal (N) {…} • Ordinal (O) <…> • Scale / Quantitative (Q) […] • Q → O • [0, 100] → <F, D, C, B, A> • O → N • <F, D, C, B, A> → {C, B, F, D, A} • N → O (??) • {John, Mike, Bob} → <Bob, John, Mike> • {red, green, blue} → <blue, green, red>?? • O → Q (??) • Hashing? • Bob + John = ?? Readings in Information Visualization: Using Vision To Think. Card, Mackinglay, Schneiderman, 1999

  8. Operations on Basic Data Types • What are the operations that we can perform on these data types? • Nominal (N) • = and ≠ • Ordinal (O) • >, <, ≥, ≤ • Scale / Quantitative (Q) • everything else (+, -, *, /, etc.) • Consider a distance function

  9. Questions?

  10. Dimensionality • Scalar • A single value • Vector • A collection of scalars • Matrix • A collection of vectors • Tensor • A collection of matrices

  11. Dimensionality (Programming) • Scalar • 0-dimensional array • Vector • 1-dimensional array • Matrix • 2-dimensional array • Tensor • 3 or more dimensional array

  12. Dimensionality (Technically) • Scalar • 0th order tensor • Vector • 1st order tensor • Matrix • 2nd order tensor • Tensor • n-d tensor

  13. Example, OLAP • OLAP = OnLineAnalytical Processing • Often referred to as “data cube” or “hypercube” Image from Wikipedia: OLAP Cube

  14. OLAP Operations • Slice • Selects a subset of the original n dimensional cube • Result set could be of any dimensionality • Roll up (consolidate) • Creates a hierarchy based on the dataset • Same as clustering • Drill down • Expand a cluster • Pivot • Changes the orientation of the cube • Combine with the 4 basic SQL commands: • SELECT, UPDATE, INSERT, DELETE Adapted from Wikipedia: OLAP Cube

  15. Examples – Roll up and Drill down

  16. OLAP vs. SQL • Often used in business intelligence • Allows for quick change in perspective of the same data • For example, consider the above case implemented in SQL vs. OLAP • Considered as an abstract representation of RDBMS • Supported by many commercial databases • Uses a language called MDX MDX example from Wikipedia: MDX

  17. Application • Related to your homework 1 • A powerful data representation for analysis. • Is the basis of Tableau Software

  18. Questions?

  19. Metadata • Introduced by Lisa Tweetie in CHI 1997 (“Characterizing Interactive Externalizations) • Defined as “data about data” • Extends the original concept by Bertin of data values and data structures. • Values (low-level): variables relevant to a problem • Structures (high level): relations that characterize the data as a whole (e.g. links, equations, constraints)

  20. What is Metadata?

  21. Metadata – 4 Relationships Derived Values Example: average Derived Structure Example: sorting a list of variables • Values → Derived Values • Values → Derived Structure • Structure → Derived Values • Structure → Derived Structure

  22. Values → Derived Values → Derived Structure • Values: a (text) document corpus • Derived values: compute the similarities between the documents • Derived Structure: apply multi-dimensional scaling to plot the documents in a spatial view.

  23. Values → Derived Values → Derived Structure IN-SPIRE by PNNL

  24. Structure → Derived Structure → Derived Values • Structure: a tabular layout of individuals’ relationships with each other • Derived Structure: convert the tabular structure to a graph • Derived Values: compute centrality to identify the importance of the individual in this social network

  25. Structure → Derived Structure → Derived Values Image taken from: http://beth.typepad.com/beths_blog/2009/12

  26. Questions?

  27. An Example(Raw Data)

  28. (Structure → Derived Structure, Values → Derived Values)

  29. (Structure → Derived Values)

  30. Values → Derived Values

  31. Structure → Derived Structure

  32. Structure→ Derived Values

  33. Analysis Flow Table 1 Table 2 Table 3 Create attribute Age, Income Pivot by Professions Operation Mean Table 5 Table 4 Table 6 Create classes Avg_Age, Avg_Income Pivot by Avg_Income

  34. Application • An analyst can continue such process, or back-track, or branch from any given point. • Related to your homework 1!

  35. Questions?

More Related