270 likes | 545 Views
GGobi. Dr. Yan Liu Department of Biomedical, Industrial and Human Factors Engineering Wright State University . Introduction. Overview An open source visualization program for exploring high-dimensional data Basic plots
E N D
GGobi Dr. Yan Liu Department of Biomedical, Industrial and Human Factors Engineering Wright State University
Introduction • Overview • An open source visualization program for exploring high-dimensional data • Basic plots • scatterplot, scatterplot matrix, parallel coordinates, time series, histogram • Interaction • Tour, linking and brushing, dynamic selection, probing, zooming, etc. • Interface • Graphic user interface (GUI) when using GGobi as stand-alone tool • A command-line interface when using GGobi with R (rggobi package)
Load Dataset • Two Data Types • XML(extensible markup language) • CSV (Comma-seperated variables) File >> Open
First Two Windows after Loading a Dataset Move mouse cursor over a control to see tooltip that explains its function 2 3 4 5 1 1 Clicking buttons to select which variables to be mapped to the X(horizontal) and Y(vertical) axes Scatterplot (XY Plot) GGobi Console Adjusting cycling speed 4 2 When checked, all the possible XY plots (changing variables mapping to the X and/or Y coordinates) are displayed automatically (one after another) 3 Select whether to fix the variable mapped to X/Y axis when cycling the plot 5 Specify the direction of cycling (according to the list)
Menu bar • File: open an existing file, open a new console, save data, close the current console, or quit the application • Display: open a new plot window (2D scatterplot, scatterplot matrix, parallel coordinates, time-series, or bar chart) • View: specify the projection (1D, 1D tour, 2D, 2D tour, etc) for the current display • Interaction: specify interaction(scaling, highlighting, moving, etc.) with the current display • Tools: open other windows to manipulate characteristics of data and view
Open a New Plot • Display >> New Barchart to open a new bar chart of the dataset • The current active plot is surrounded by a narrow white band • The GGobi console corresponds to the current active plot • Click a plot (in the plot region) to make it active
Tour • A d-dimensionalgrand tour is a continuous geometric transformation of a d-dimensional coordinate systems such that all possible orientations of the coordinate axes are eventually achieved • Allow for an in-depth study of high dimensional data • Types of Tour in GGobi • 1D tour • 2D tour • Rotation: 2D tour with three variables • 2X1D tour
1D Tour • View >> 1D Tour • Generates a continuous sequence of 1-D projections of the active variable space • The active variable space is the subset of attribute currently selected • Variable circles are drawn with a bold outline • De-selected variable fades out gradually to maintain continuity of motion • The projected data are displayed as an average shifted histogram (ASH) • The idea of average shifted histogram is that we generate a set of histograms with different origins and then we average these histograms • Manually select the variable to manipulate • Click on the purple Manip button and then click on the variable circle • Horizontal mouse motions in the plot window then alter the coefficient for the manipulation attribute (from -1 to 1)
Axes for the tour • can be removed by Options >> show axes 1 Select the attributes for the tour 2 Current manipulation attribute 3 Adjust tour speed 4 Stop/start tour 5 1 Initiate the tour (the coefficient of the manipulation attribute starts from 1) 6 Start the tour from a randomly selected coefficient of the manipulation attribute 4 7 5 3 Control the number of histograms (the smoother, the more histograms) 8 6 7 2 8
2D Tour • Axes for the tour • can be removed by Options >>show axes 1 Select the attributes for the tour 2 3 • Manipulation modes • Oblique: unconstrained manipulation • Horizontal/Vertical: constrain manipulation along horizontal/vertical axis • Radial: constrain manipulation to the current direction of the variable keeping angle fixed • Angular: allows rotating the variable axis in the plane of the plot window, keeping the length of the axis fixed 3 2 1 View >> 2D Tour Generates a continuous sequence of 2-D projections of the active variable space The projected data are displayed as a scatterplot Many features are similar to those in 1D tour
2D Tour with Three Variables View >> Rotation 2D tour restricted to using three variables The three axes are individually represented by toggle buttons labeled X, Y and Z
2x1D Tour Axes for the tour 1 Select the horizontal and vertical attributes for the tour 2 • Manipulation modes • Combined: change both horizontal and vertical manipulation variable coefficients • EqualComb: constrains the horizontal and vertical changes to be equal • Horizontal/Vertical: constrain manipulation along horizontal/vertical axis 3 3 1 2 • View >> 2x1D • Generates 2 independent continuous sequences of 1D projections of 2 active variable space • Plotting results horizontally and vertically generating a scatterplot
Scaling of Axes Zoom vertically/horizontally 1 4 Pan vertically/horizontally 2 1 Control whether to hold aspect ratio constant during zooming 3 3 More scale controls 4 2 • Interaction >> Scale • Changing the view of the data rather than transforming it • Using the sliders on the console • Direct manipulation in the plot window • Left mouse button for panning (moving data freely around window) • Right or middle mouse button for zooming (moving up/down for zooming in/out along vertical axis; moving right/left for zooming in/out along horizontal axis)
Brush • Interaction >> Brush • Interactively paint(highlight) points • More powerful when linking multiple plots • “brush” • all points in the brush will be affected • press left mouse button to drag it around the plotting window • press right or middle button to resize or reshape the brush 1 2 3 4 1 Choose the color, glyph shape and size, and line type of the edge of the “brush” 2 If checked, the points brushed will remained highlighted after the “brushed” is move away 3 Undo the most recent persistent brushing changes 4
6 5 5 6 • Specify characteristics of “brushed” points • by “Shadow”: brushed points are drawn in a color that’s very close to the background color, so that these points are de-emphasized yet provide context for the rest of the data 5 Specify characteristics of “brushed” line segments 6
Brush menu options 7 7 • “Exclude shadowed points/edges”: exclude shadowed points/edges from the plot, and the view of the plot is rescaled without them • “Include shadowed points/edges”: redraw the shadowed points/edges and include them in the rescaling • “Unshadow all points/edges”: restore the points/edges to their usual colors • “Reset brush”: restore the brush to its default size and position • “Brush on”: if unchecked, the brush can freely move around the plotting window without changing the covered points (useful when need to position the brush quickly before painting) • “Update brushing continuously”: update linked brushing with every mouse motion; if unchecked, update linked views only when mouse is released
8 Two linked views • Linking rules • by “Case ID”: when points in one view are painted, the points corresponding to the same records (cases) in the other view are also painted • by “Area”: points that have the same value in variable Area are painted in both views 8
Identification of Points • Interaction >> Identify 1 • Specify how to label each record (by which attribute) • by “Record Label” by default (supplied by the user) • by “Record Number” (if no record label is supplied) • by a specific attribute 1 2 2 3 Remove labels in the plot 4 3 Label all records in the plot Recenter the plot based on the selected record in the plotting window 4
Variable Manipulation Tool • Tools >> Variable Manipulation • Display some statistics of attributes • For a continuous(real) attribute, display variable transformation (if any), min, max, mean, and median of raw data, and # of missing values) • For a categorical attribute, display # of values for each level of the attribute • Select subsets of attributes to be plotted when launching parallel coordinates or scatterplot matrix • CTRL/Shift key for selecting multiple inconsecutive/consecutive attributes
Variable Manipulation Tool (Cont.) Set variable ranges 1 Rescale the plot using the user specified ranges 2 Clone selected variables 3 Create a new variable 4 Rename variable name 5 • Set variable ranges • Change the variable ranges used for projecting data into the plot window • Clone the selected variables and add the new variables to the console and data table • Create a new variable • Its value is set to either the row numbers (1: # of records) or a set of integers reflecting the assignment of groups defined by brushing • Rename the name of the selected variable 2 1 5 4 3
Variable Transformation Tool T(Y) = (Yλ – 1)/λ λ: transformation parameter T(Y) = (Y – sample mean)/sample standard deviation • Tools >> Variable Transformation • Stage 0: adjust the domain of variables • Change minimum to 0 or 1; negative • Stage 1: data-independent transformations, preserving user-defined limits • Box-Cox family of linear transformation • Take logarithmic function of base 10 • Inverse • Take absolute value • Scale to a specific range [a, b] • Stage 2: data-dependent transformations • Standardize • Normal score • Z-score • Sort • Rank, etc.
Jittering • Tools >> Variable Jittering • Add random noise to selected variables in order to ameliorate the problem of data overlapping • Select uniform or normal random jitter • Choose the degree of jittering (0-1)
Color Schemes • Tools >> Color Schemes • Select a color scheme and use it to color points • The number of colors in the selected color scheme to color data points cannot be less than the number of colors used currently used • The colors placed first in the color scheme are used first to color data points • Four types of color schemes • Diverging: used when the midpoint of the variable has important meaning • Sequential: used to highlight a continuous progression of values (for continuous variables) • Spectral: use different colors in the spectrum (3-11 colors) • Qualitative: used for categorical variables
Automatic Brushing Select variable 1 Values of the chosen variable that define boundaries between colors (can be adjusted by moving the sliders) 2 1 2 # of points in each color 3 • Define bin boundaries • “constant bin width”: the range (difference between the max. and min. values) in each bin is (almost) the same • “constant bin count”: # of counts in each bin is (almost) the same 4 3 4 5 • Control how display will respond as sliders are moved • continuously (only when data size is small) • on mouse release 5 • Tools >> Automatic Brushing • Paint data according to the values of a specific variable
Select Subset of Data • Random sample without replacement • Specify # of records to be randomly sampled • Consecutive Block • Specify the first record and # of records in the block • Limits • Use the limits defined by the user in the variable manipulation table to define the subset • Tools >> Case Subsetting and Sampling
Select Subset of Data (Cont.) • Every nth Case • The first record and k(1+N)th (k=1,2…; N: the interval size) records will be sampled • Sticky Label • All cases with “sticky” labels (formed through identification) will be in the subset • Row Label • Type in a string, and specify where it should fall (or not fall) in record labels and whether cases should be included or ignored
Save Display Tools >> Save Display Description Save the description of the display as a XML file