1 / 38

Nishant Mehta, Elke A. Rundensteiner and Matt Ward Computer Science Department

Hierarchy Navigation Framework: Supporting Scalable Interactive Exploration over Large Databases. Nishant Mehta, Elke A. Rundensteiner and Matt Ward Computer Science Department Worcester Polytechnic Institute IDEAS’05 Thank you to NSF for several IDM grants for XMDV project.

rex
Download Presentation

Nishant Mehta, Elke A. Rundensteiner and Matt Ward Computer Science Department

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hierarchy Navigation Framework: Supporting Scalable Interactive Exploration over Large Databases Nishant Mehta, Elke A. Rundensteiner and Matt Ward Computer Science Department Worcester Polytechnic Institute IDEAS’05 Thank you to NSF for several IDM grants for XMDV project.

  2. XmdvTool: Multivariate Data Visualization • Example 8 Cars Data Set 130 3504 18 Dataset with 4096 points in XmdvTool 6.0 Parallel coordinate display

  3. Hierarchical Displays [Fua:99] Cars Data Set C I D G H J Base Data Points A 6 B E 4 2 Structure-based brush components: b- level of detail d- focus area e- focus extents C D F 0 0 3 J 0 G I 0 0 H 0

  4. Hierarchical Displays

  5. Problems: Hierarchical Display Achieved: • Screen space solution to clutter problem But • Data handing problem remains … • Cluster tree size greater than initial tree • Cluster tree may not fit into main memory • Structure based brush semantics involve recursive searches over cluster tree

  6. Goal • Overall Goal: • Scale hierarchical displays to support navigation over large hierarchies • Subgoals : • Support navigation over large-scale persistent data • Store hierarchies on disk • Map navigation operations to efficient queries • Meet interactive response requirements

  7. Overview of Approach Hierarchy Encoding Support navigation operations over large scale persistent data Spatial Indexing Caching Meet interactive response requirements Prefetching

  8. Hierarchy Encoding Problem : Structure-based brush • Selection semantics involve recursive search • Recursive search over secondary storage is slow Solution: Hierarchy encoding • Push recursive processing into precomputation step • Precompute label for each node in hierarchy • Map recursive search to equivalent non-recursive one Hierarchical Data Labeling Database Hierarchy Encoding

  9. Structure-Based Brush Semantics [Fua:99] • Horizontal Selection • Subtree (e1, e2) • Vertical Selection • Level of detail (lod) Node selection based on 2 steps: A 0.6 B E 0.4 0.2 C D J F 0 0 0 0 G I 0 H 0 0

  10. Horizontal Selection • Aim Select subtree that user is interested in viewing • Approach • Brush focus extents (e1,e2), select set of base points. • Propagate selection: select parent(n) if n is selected A 0.6 B E 0.2 0.4 C D J F 0 0 0.3 0 Selected Clusters I G Selected Leaves 0 0 H (e1,e2) = (2/6, 11/12) , lod=0.4 0

  11. Non-Recursive Horizontal Selection Offline • Precompute intervals for each node (hmin, hmax) • Interval of parent includes interval of child Online • Search for nodes that intersect brush interval (e1,e2) A 0.6 (0,1) B E 0.2 0.5 (0,2/6) (2/6,1) C D 0 0.3 J 0 F 0 (2/6,5/6) (0,1/6) (1/6,2/6) (5/6,1) I 0 G 0 H 0 (4/6,5/6) (e1,e2) = (2/6, 11/12) , lod=0.4 (2/6,3/6) (3/6,4/6)

  12. lod=0.4 Vertical Selection • Aim • Select points at desired lod (lod handle of SBB) • Approach • Explore each branch starting at root to find node: • lod(n) <= lod(brush) A 0.6 B E 0.2 0.5 C D 0 0 J 0 F 0.3 I 0 0 G H SBB: (e1,e2) = (2/6, 11/12) , lod=0.4 0

  13. Non-Recursive Vertical Selection • Node n satisfies vertical selection criteria iff: lod(n) <= lod(brush) < lod(parent(n)) vmin<= lod(brush) < vmax 0.6,  A 0.6 0.2,0.6 B E 0.2 0.5 0.5,0.6 C D 0.3,0.5 0,0.2 0,0.2 0 0 J 0,0.5 F 0.3 0 I 0 0,0.3 0,0.3 G 0 H 0,0.3 0 SBB: (e1,e2) = (2/6, 11/12) , b=0.4 Each node n, has extents (vmin,vmax) lod(brush) = 0.4

  14. Non-Recursive Selection • Selects all nodes that satisfy: • hmin <= e2 and hmax >= e1 • vmin <= lod(brush) < vmax 0.6,  A (0,1) 0.2,0.6 B E 0.5,0.6 (2/6,1) (0,2/6)` C D 0.3,0.5 0,0.2 0,0.2 J 0,0.5 F (2/6,5/6)` (0,1/6) (1/6,2/6) (5/6,1) I 0,0.3 0,0.3 G H 0,0.3 (2/6,3/6) (4/6,5/6) (3/6,4/6) SBB: (e1,e2) = (2/6, 11/12) , lod=0.4

  15. 2D Hierarchy Map Brush 1.0 A 0.6,  A (0,1) 0.6 B E 0.2,0.6 B E 0.5 0.5,0.6 F J (2/6,1) (0,2/6) lod C D 0.3,0.5 0,0.2 0,0.2 0,0.5 0.3 F J (2/6,5/6) (0,1/6) (1/6,2/6) G H I (5/6,1) 0.2 C D I 0,0.3 0,0.3 G H 0,0.3 0 (2/6,3/6) (4/6,5/6) 3/6 4/6 5/6 1 (3/6,4/6) 1/6 e1 e2 SBB: (e1,e2) = (2/6, 11/12) , lod=0.4

  16. Properties of 2D Hierarchy Map • Progressive Tree Structure • Space Filling • Non-Overlapping 1.0 A 0.6 B E 0.5 E F J F 0.3 G H B I 0.2 C D 0 1/6 2/6 3/6 4/6 5/6 1

  17. selected Navigation operations in 2D Hierarchy Map Brush 1.0 A 0.6 B E 0.5 F J 0.3 G H I 0.2 C D 0 2/6 3/6 4/6 5/6 1 1/6

  18. Spatial Index • Q searches for nodes intersecting structure based brush • Q is spatial range query over spatial objects 1.0 Brush A 0.6 B E E 0.5 F J 0.3 F G H B I 0.2 C D 0 1/6 2/6 3/6 4/6 5/6 1 2D Hierarchy Map • Spatial Index (R-Tree index) can help faster searches

  19. Next • Caching and Prefetching

  20. User Trace Characteristics [Doshi:2003] Brush A 0.6 B E 0.5 F J 0.3 G H I 0.2 C D 1 0 2/6 3/6 4/6 5/6 1/6 • Locality of exploration • Contiguous queries have similar answers Caching Prefetching • Presence of idle time • Predictable of user movements (User Inertia)

  21. Cache Design • Purpose • Minimize system latency • Design Issues • Cache Organization • Cache Lookup Policy • Cache Replacement Policy • Computation of Remainder Queries

  22. Cache Organization • Cache Lookup Policy • Cache Replacement Policy • Computation of Remainder Queries A G F H E Cache Organization • Contiguous chunk of main memory that stores recently fetched nodes • Each node has a descriptor • Horizontal and Vertical Extents (0,1) A A B E E (0,1) F J F G H I G H C D (0,0) (1,0) 2D Hierarchy Map in database (0,0) 2D Hierarchy Mapof Cache Contents (1,0) empty occupied 0

  23. Cache Organization • Cache Lookup Policy • Cache Replacement Policy • Computation of Remainder Queries empty occupied selected Cache Lookup • Aim: • Find nodes in cache that lie in current brush • Cache Lookup • Sequential scan, or • Main memory spatial index Brush (0,1) A • Main Memory Index • Advantage • Faster cache look up • Disadvantage • Frequent index updates E F G H (0,0) (1,0)

  24. Cache Organization • Cache Lookup Policy • Cache Replacement Policy • Computation of Remainder Queries Cache Replacement Policy • Aim: • Make room for new nodes • Replace node with least probability of being referenced. • Approach • Exploit general user trace characteristics Temporal Locality Contiguous queries have similar answers LRU Spatial Locality Locality of Exploration Distance

  25. Cache Organization • Cache Lookup Policy • Cache Replacement Policy • Computation of Remainder Queries Distance Replacement Policy • Realization : • Maintain brush store • Select victim brush with max distance from current brush • Replace individual cached nodes in victim brush • Idea • Replace object furthest away (2D space) from current brush Distance: Length of line segment that joins center of 2 brushes.

  26. Cache Organization • Cache Lookup Policy • Cache Replacement Policy • Computation of Remainder Queries I A B G E F empty occupied selected Distance Replacement Policy b1 b4 b2 b3 Brush Store Cache Contents Current Brush Current Brush (0,1) A A b1 (0,1) b2 B E B E F F J b3 b4 G H I G H I C D Cache Contents (0,0) (1,0) (0,0) (1,0) Database Contents

  27. Cache Organization • Cache Lookup Policy • Cache Replacement Policy • Computation of Remainder Queries empty occupied selected Computation of Remainder Queries • For each user request cache may contain: • All nodes requested • A subset of nodes requested • None of nodes requested Brush Remainder Brush (0,1) A E F G H (0,0) Cache Contents (1,0)

  28. Cache Organization • Cache Lookup Policy • Cache Replacement Policy • Computation of Remainder Queries empty occupied selected Computation of Remainder Queries • Focus extents (e1,e2) of brush define interval • Horizontal extents of cached nodes also form an interval • Remainder query consists of a set of remainder brushes • Remainder brush: Part of brush interval not occupied by cache nodes (0,1) A E J F G Cache Contents e2 (1,0) (0,0) e1 Current Brush Remainder Brush

  29. Prefetcher [Doshi:03] • Aim: • Predict and prefetch future user requests into cache • Increase hit ratio or minimize latency • Motivation • Presence of idle time • Predictable user movements Prefetching • Working Model: Prediction Model User Log User GUI Prefetcher Cache Manager User Requests Front End

  30. Directional Prefetcher • Prediction Model • Uses recent history of user requests • Prefetches in direction of last user movement Direction Strategy t t+1 e2 e2 e2 Prefetch

  31. Spatial Index Seq. Scan User GUI Front End System Architecture Labeling Hierarchical Data Flat Data Spatial Index Database Offline process query LRU Loader data Distance Cache Index Cache Memory Rep. Policy Cache Lookup query Cached Nodes Request Delta Calculator Backend Controller Request Cache Manager Answer Answer Delta query Prefetch Request Prefetch Controller Direction Prefetcher Start/Stop Start/ Stop

  32. System Implementation • Implemented as backend to XmdvTool 6.0 • Language: C++ • Database: Oracle with Oracle Spatial Extension • Libraries: • Spatial Index Library (UC Riverside) • OTL (Oracle.. Template library) • ZThread

  33. Evaluation • Goal: • Effectiveness of Proposed Techniques in Isolation and in Combination • Workloads: • Real Datasets • D1, out5d, size = 20,000, dimensions =5 • D2, uvw, flow simulation data, size = 200,000, dimensions = 6 • Input • A set of 4 ,1/2 hr. real user traces collected in [Doshi:2003apr] for dataset D1. • A set of 4, 1/2 hr. synthetic user traces for dataset D2 • User Trace • Sequence of user requests. • Each user request (position of SBB, time)

  34. Evaluation Metrics • Latency for User Trace • Latency Reduction Ratio (lrr) • Li = Latency for request i. • Ti = Number of nodes in request i • Base Configuration • No Index at the database

  35. Experimental Results: Brief Summary • Spatial Index on the database used alone • lrr  33% for Data Set D1 • lrr  72% for Data Set D2 • Cache • lrr  58% for Data Set D1 (Cache Size = 10%) • lrr  94% for Data Set D2 (Cache Size = 2%) • Comparison of Replacement Policies • Distance replacement policy performs as well or better than LRU • Increase in hit ratio  7% , Increase in lrr  2% for Data Set D2 • Main Memory Index • We need spatial index structures that support high update rates. (e.g. LR-Tree [Bozanis:2003]) • Prefetcher and Cache • lrr  63% for Data Set D1 • lrr  96% for Data Set D2

  36. Related Work • Visualization-database integrated systems • ADR [Kurc:2001] • Tioga [Stonebaker:1993] • USD [Johnson:1992] • Caching • Semantic Caching [keller:1996] or • Predicate Caching [dar:1996] • Hierarchy Encoding • Nested Interval Method [Celko:2004] • Dietz’s numbering scheme [dietz:1982] • Dewey Order Encoding [tatxmlorder:2002]

  37. Conclusions • Hierarchy encoding technique • Maps tree structures to 2 dimensional spaces • Maps visual exploration operations to spatial range queries • Designed cache to reduce response time • Replacement Policy: Distance or LRU • Cache Lookup: Sequential or Spatial Index • Integrated direction-based prefetcher • Implemented in free-ware XMDV Tool • Conducted a performance study

  38. References [Doshi:2003] P. Doshi et al. Prefetching for Visual Data Exploration [Doshi:2003apr] P. Doshi et al. A strategy selection framework for adaptive prefetching in data visualization [Bozanis:2003] P. Bozanis et al. LR-Tree: a logarithmic decomposable spatial index method [Celko:2004] J. Celko. Joe Celko’s Trees and Hierarchies in SQL for Smarties [Teuhola:1996] J. Teuhola. Path signatures to speed up recursion in relational databases [Stonebaker:1993] M. Stonebraker et al. Providing data management support for scientific visualization applications [dar:1996] S. Dar et al. Semantic Data Caching and Replacement [keller:1996] A.M. Keller et al. A predicated based caching scheme for client-server database architectures. [Kurc:2001] T. Kurc et al. Exploration and visualization of large datasets with the active data repository [Johnson:1992] M. Goldner et al. Usd- a database management system for scientific research [Fua:1999] Y.H. Fua et al. Navigating hierarchies with structure-based brushes [dietz:1982] P.F. Dietz, Maintaining order in a linked list [tatxmlorder:2002] I. Tatarinov et al. Storing and Querying Ordered {XML} Using a Relational Database System [Stroe:2000] I. Stroe. Scalable Visual Hierarchy Exploration

More Related