1 / 143

Visual Analytics Review

Visual Analytics Review. IAT 355 Lyn Bartram. Overview. Topics ( in no particular order) Data models and analytics Information visualization techniques: Types and components Interaction Perception Cognition Navigation and Scent Presentation and screen space. Overview and definitions.

Download Presentation

Visual Analytics Review

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.


Presentation Transcript

  1. Visual Analytics Review IAT 355 Lyn Bartram

  2. Overview • Topics ( in no particular order) • Data models and analytics • Information visualization techniques: Types and components • Interaction • Perception • Cognition • Navigation and Scent • Presentation and screen space IAT 355 Introduction

  3. Overview and definitions IAT 355 Introduction

  4. Information visualization • visual metaphors for non-inherently spatial data such as the exploration of text-based document databases. • More abstract • Assign structure and position to information that has none • Text • Statistics • Finance/Business • Internet • Software IAT 355 Introduction

  5. Visual analytics • analytical reasoning supported by the interactive visual interface • Intersection of visualization with data analysis • Biology • National security IAT 355 Introduction

  6. Visual thinking Visual thinking involves: Constructing visual queries on displays Visual search strategies through eye movements and attention to relevant patterns Visual notification and attention “redirection” to new patterns and events Well structured balance of elements and tasks IAT 355 Introduction

  7. Data Analytics IAT 355 Introduction

  8. Data We Use • Data Models • Types • Metadata • Aggregates • Descriptive Statistics • Distribution • Clusters Show Me the Numbers! : Data

  9. Data models • take raw data and transform it into a form that is more workable • Main idea: build a model • Individual items are called cases or records • Cases have attributes : an attribute is a value of a variable or factor • In vis terms, a dimension

  10. How many dimensions? • Data sets of dimensions 1, 2, 3 are common • Number of variables per class • 1 - Univariate data • 2 - Bivariate data • 3 - Trivariate data • >3 - Hypervariatedata • These are the fun and interesting ones! But hard! Show Me the Numbers! : Data

  11. Data Types (measurements) • Nominal: categorical,( equal or not equal to other values) • Example: gender, Student Number • No concept of relative relation other than inclusion in the set • Ordinal : sequential ( obeys < > relation, ordered set • Example: Size of car, speed settings on road • Example: mild, medium, hot, suicide • Distance is not uniform Show Me the Numbers! : Data

  12. Data Types 2 • Interval : Relative measurements, no fixed zero point. • Data is numerical, not categorical. Rank order among variables is explicit with an equal distance between points in the data set: -2, -1, 0, +1, +2 • can say “twice as much as” • Example: height above sea level, hours in a day • Ratio: Interval data with absolute zero • Example: account balance, degrees Kelvin Show Me the Numbers! : Data

  13. Dimensions • Data Dimensions are classified as: • Quantitative i.e. numerical • Continuous (e.g. pH of a sample, patient cholesterol levels) • Discrete (e.g. number of bacteria colonies in a culture) • Categorical • Nominal (e.g. gender, blood group) • Ordinal (ranked e.g. mild, moderate or severe illness). Often ordinal variables are re-coded to be quantitative.

  14. Metadata • Descriptive information about the data • Might be something as simple as the type of a variable, or could be more complex • For times when the table itself just isn’t enough • Example: if variable1 is “l”, then variable3 can only be 3, 7 or 16 • Missing values, uncertainty or importance are all examples of metadata Show Me the Numbers! : Data

  15. Primary types of data analysis Qualitative Descriptive. Used to describe the distribution of a single variable or the relationship between two nominal variables (mean, frequencies, cross-tabulation) Inferential (Used to establish relationships among variables; assumes random sampling and a normal distribution) Nonparametric (Used to establish causation for small samples or data sets that are not normally distributed) Show Me the Numbers! : Data

  16. Descriptive Statistics • Range • Min/Max • Average • Median • Mode Distribution Statistics • Variance • Error • Standard Deviation • Histograms and Normal Distributions Show Me the Numbers! : Data

  17. Range, Min, Max • The Range • Difference between minimum and maximum values in a data set • Larger range usually (but not always) indicates a large spread or deviation in the values of the data set. (73, 66, 69, 67, 49, 60, 81, 71, 78, 62, 53, 87, 74, 65, 74, 50, 85, 45, 63, 100)

  18. Average = measure of centrality Measures of location indicate where on the number line the data are to be found. Common measures of location are: (i) the Arithmetic Mean, (ii) the Median, and (iii) the Mode

  19. The data may or may not be symmetrical around its average value 0 2.5 7.5 10 4.8 0 2.5 7.5 10 4.8 The mean is vulnerable to problems

  20. The Median The middle value in a sorted data set. Half the values are greater and half are less than the median. Another measure of central location in the data set. (45, 49, 50, 53, 60, 62, 63, 65, 66, 67, 69, 71, 73, 74, 74, 78, 81, 85, 87, 100) Median: 68 (1, 2, 4, 7, 8, 9, 9)

  21. 0 2.5 7.5 10 6.25 • The Median • May or may not be close to the mean. • Combination of mean and median are used to define the skewness of a distribution. Show Me the Numbers! : Data

  22. The Mode • The Mode • The most frequent occurring value. • Another measure of central location in the data set. • (45, 49, 50, 53, 60, 62, 63, 65, 66, 67, 69, 71, 73, 74, 74, 78, 81, 85, 87, 100) • Mode: 74 • Generally not all that meaningful unless a larger percentage of the values are the same number Show Me the Numbers! : Data

  23. When do we use what? • Dependent on how the data are distributed • Note if mean=median=mode then the data are said to be symmetrical • Rule of thumb: • use mean if data are normally distributed and variance is within constraints • Use median to reduce effects of outliers Show Me the Numbers! : Data

  24. Summary http://statistics.laerd.com/statistical-guides/measures-central-tendency-mean-mode-median.php Show Me the Numbers! : Data

  25. Data distribution • Measures of dispersion characterise how spread out the distribution is, i.e., how variable the data are. • Commonly used measures of dispersion include: • Range • Variance & Standard deviation • Coefficient of Variation (or relative standard deviation) • Inter-quartile range Show Me the Numbers! : Data

  26. Measures of variance • Variance • One measure of dispersion (deviation from the mean) of a data set. The larger the variance, the greater is the average deviation of each datum from the average value • Standard Deviation • the average deviation from the mean of a data set. • An outlier is an datum which does not appear to belong with the other data Show Me the Numbers! : Data

  27. Inter-quartile range • The Median divides a distribution into two halves. • The first and third quartiles (denoted Q1 and Q3) are defined as follows: • 25% of the data lie below Q1 (and 75% is above Q1), • 25% of the data lie above Q3 (and 75% is below Q3) • The inter-quartile range (IQR) is the difference between the first and third quartiles, i.e. IQR = Q3- Q1

  28. Box-plots • A box-plot is a visual description of the distribution based on • Minimum • Q1 • Median • Q3 • Maximum • If a data point is < lower limit or > upper limit, the data point is considered to be an outlier. • Useful for comparing large sets of data

  29. Distribution is important for Aggregation • Visualization helps us see relations – or the trends of them - as visual patterns • a lot of what we visualize are the descriptive statistics • Example: mean income vs median income • Need to ensure that the univariate units of visualization are legit • Rule: check your core units /variables. If hey are descriptive, look at the distribution Show Me the Numbers! : Data

  30. Example: job losses in US over time Show Me the Numbers! : Data

  31. Example: job losses in US over time Show Me the Numbers! : Data

  32. Show Me the Numbers! : Data

  33. 2D Visualization Classes IAT 355 Introduction

  34. Graphs Charts Maps Diagrams Types of Symbolic Displays (Kosslyn 89)

  35. Types of Symbolic Displays • Graphs • at least two scales required • values associated by a symmetric “paired with” relation • Examples: scatter-plot, bar-chart, layer-graph

  36. Graphs • Encode quantitative information using position and magnitude of geometric objects. • Examples: scatter plots, bar charts.

  37. Types of Symbolic Displays • Charts • discrete relations among discrete entities • structure relates entities to one another • lines and relative position serve as links • Examples: • Family tree • Flow chart • Network diagram

  38. Map • Internal relations determined (in part) by the spatial relations of what is pictured • Grid: geometric metadata • Locations identified by labels • Nominal metadata • Examples: • Map of census data • Topographic maps IAT 355

  39. Choropleth Map • Areas are filled and colored differently to indicate some attribute of that region IAT 355

  40. Diagrams • Schematic pictures of objects or entities • Parts are symbolic (unlike photographs) • how-to illustrations • figures in a manual From Glietman, Henry. Psychology. W.W. Norton and Company, Inc. New York, 1995

  41. Graph Components • Framework (spatial substrate) • Measurement types, scale • Geometric Metadata • Content • Marks, lines, points • Data • Labels • Title, axes, ticks • Nominal Metadata IAT 355

  42. Marks • Things that occur in space • Points • Lines • Areas • Volumes IAT 355

  43. Graphical Properties • Size, shape, color, orientation... IAT 355

  44. What goes where • In univariate representations, we often think of the data case as being shown along one dimension, and the value (quantity) in another Y Axis is quantitative Graph shows change in Y over continuous range X Y Axis is quantitative Graph shows value of Y for 4 cases IAT 355

  45. Bivariate Data Price • Representations • Scatter plot • Each mark is a data case • Want to see relationship between two variables • What is the pattern? • Note both variables are continuous data Mileage IAT 355

  46. Multivariate: Project data onto other graphical variables • E.G., Use blob attribute for another variable Price Price Mileage Mileage IAT 355

  47. Alternative • Represent each variable on its own line Small multiples IAT 355

  48. Data projection • Fundamentally, we have 2 display dimensions • For data sets with >2 variables, we must project data down to 2D • Come up with visual mapping that locates each dimension into 2D plane • Computer graphics 3D->2D projections IAT 355: Mutivariate Data

  49. What is Multivariate Data? • Each data point has N variables or observations • Each observation can be: • nominal or ordinal • discrete or continuous • scalar, vector, or tensor • May or may not have spatial, temporal, or other connectivity attribute This slide courtesy of Matt Ward, UC Berkeley

  50. Methods for Visualizing Multivariate Data Dimensional Subsetting Dimensional Reorganization dimensional re-ordering Dimensional Embedding Dimensional Reduction This slide courtesy of Matt Ward, UC Berkeley

More Related