200 likes | 390 Views
Worcester Polytechnic Institute. XmdvTool Interactive Visual Data Exploration System for High-dimensional Data Sets. http://davis.wpi.edu/~xmdv. Matthew O. Ward, Elke A. Rundensteiner, Jing Yang, Punit Doshi, Geraldine Rosario, Allen R. Martin, Ying-Huey Fua, Daniel Stroe .
E N D
Worcester Polytechnic Institute XmdvTool Interactive Visual Data Exploration System for High-dimensional Data Sets http://davis.wpi.edu/~xmdv Matthew O. Ward, Elke A. Rundensteiner, Jing Yang, Punit Doshi, Geraldine Rosario, Allen R. Martin, Ying-Huey Fua, Daniel Stroe This work partially funded by NSF Grants IIS-9732897, IRIS-9729878 and IIS-0119276
XmdvTool Features • Hierarchical visualization and interaction tools for exploring very large high-dimensional data sets to discover patterns, trends and outliers • Applications: • Bioterrorism Detection • Bioinformatics and Drug Discovery • Space Science • Geology and Geochemistry • Systems Monitoring and Performance Evaluation • Economics and Business • Simulation Design and Analysis • Multi-platform support (Unix, Linux, Windows) • Public domain software:http://davis.wpi.edu/~xmdv
Xmdv: Main Features • Scale-up to High Dimensions: Visual Hierarchical Dimension Reduction • Scale-up to Large Data Sets: Interactive Hierarchical Displays, Database Backend with Minmax Encoding, Semantic Caching and Adaptive Prefetching • Interlinked Multi-Displays: Parallel Coordinates, Glyphs, Scatterplot Matrices, Dimensional Stacking • Visual Interaction Tools:N-Dimensional Brushes, Structure-Based Brushing, InterRing
Scale-Up for Large Number of Dimensions Solution to High Dimensional Datasets: • Group Similar Dimensions into Dimension Hierarchy • Navigate Dimension Hierarchy by InterRing • Form Lower Dimensional Spaces by Dimension Clusters • Convey Dimension Cluster Information by Dissimilarity Display
Visual Hierarchical Dimension Reduction Process A 42-dimensional Data Set A 4-Dimensional Subspace Dimension Hierarchy Interaction Tool: InterRing
InterRing - Dimension Hierarchy Navigation and Manipulation Roll-up/Drill-down Rotate Zoom in/out Modify Distort
Dissimilarity Display Three Axes Method Diagonal Plot Method Axis Width Method Mean-Band Method
Scale-up for Large Number of Records Solution to Large Scale Datasets: • Group Similar Records into Data Hierarchy • Navigate Data Hierarchy by Structure-Based Brushing • Represent Data Clusters by Mean-Band Method • Provide Database Backend Support using MinMax Tree, Caching, Prefetching
Interactive Hierarchical Display 2D example Hierarchical Clustering Structure-Based Brushing
Interactive Hierarchical Display Flat Display Hierarchical Display Mean-Band Method in Parallel Coordinates
Interactive Hierarchical Display Flat Display Hierarchical Display Mean-Band Method in Parallel Coordinates
Scalability of Data Access • Approach • Attach database system to visualization front-end • MinMax hierarchy encoding • Key idea: avoid recursive processing • Pre-computed • Caching • Key idea: reduce response time and network traffic • Prefetching • Key idea: use application hints and predict user patterns • Performed during idle time
Pre-compute object positions level-of-detail (L) extent values (x,y) preserve tree structure New query semantics objects are now rectangles select objects that touch L select objects that touch (x, y) structure-based brush = intersection of two selections level of detail L x y extent values L query = (x, y, L) x y Scalability of Data Access:MinMax Hierarchy Encoding
Scalability of Data Access: Caching • Purpose • reduce response time and network traffic • Issues • visual query cannot directly translate into object IDs • high-level cache specification to avoid complete scans • Semantic caching • queries are cached rather than objects • minimize cost of cache lookup • dynamically adapt cached queries to patterns of queries
Scalability of Data Access: Prefetching • Strategy • Speculative (no specific hints) • navigation remains local • both user and data set influence exploration • Adaptive (strategy changes over time) • Evolves as more knowledge becomes available • Non-pure (interruptible prefetching) • leave buffer in consistent state • Requirements • non-pure prefetching + large transactions & small object size + semantic caching small granularity (object level) • speculative, non-pure prefetcher cache replacement policy + guessing method
Scalability of Data Access: Experimental Evaluation • Conclusions: • Caching reduces response time by 80% • Prefetching further reduces response time by 30% • Designing better prefetching strategies might help further reduce response time
m(n) (m-1) m (m+1) m(n+1) m(n-1) m(n-2) m(n) Hot Regions Current Navigation Window m(n-1) m(n+1) m(n-2) Scalability of Data Access: Prefetching Mean Strategy Random Strategy Direction Strategy Localized Speculative Strategies Exponential Weight Average Strategy Focus Strategy Data Set Driven Strategy Vector Strategies
OFF-LINE PROCESS MinMax Labeling Hierarchical Data DB DB DB Flat Data Loader Schema Info User Translator GUI Rewriter MEMORY Exploration Variables Buffer Queries Prefetcher Library: Buffer Estimator ON-LINE PROCESS Random Direction Focus Mean EWA Xmdv System Implementation • Tools • C/C++ • TCL/TK • OpenGL • Oracle 8i • Pro*C
Publications (available at http://davis.wpi.edu/~xmdv) • Jing Yang, Matthew O. Ward and Elke A. Rundensteiner, "InterRing: An Interactive Tool for Visually Navigating and Manipulating Hierarchical Structures",InfoVis 2002, to appear • Punit R. Doshi, Elke A. Rundensteiner, Matthew O. Ward and Daniel Stroe, “Prefetching For Visual Data Exploration.” Technical Report #: WPI-CS-TR-02-07, 2002 • Jing Yang, Matthew O. Ward and Elke A. Rundensteiner, “Interactive Hierarchical Displays: A General Framework for Visualization and Exploration of Large Multivariate Data Sets”, Computers and Graphics Journal, 2002, to appear • Daniel Stroe, Elke A. Rundensteiner and Matthew O. Ward, “Scalable Visual Hierarchy Exploration”, Database and Expert Systems Applications, pages 784-793, Sept. 2000 • Ying-Huey Fua, Matthew O. Ward and Elke A. Rundensteiner, “Hierarchical Parallel Coordinates for Exploration of LargeDatasets”, IEEE Proc. of Visualization, pages 43-50, Oct. 1999 • Ying-Huey Fua, Matthew O. Ward and Elke A. Rundensteiner, “Navigating Hierarchies with Structure-Based Brushes”, IEEE Proceedings of Visualization, pages 43-50, Oct. 1999