query analysis and visualization of hierarchically structured data using polaris l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Query, Analysis, and Visualization of Hierarchically Structured Data using Polaris PowerPoint Presentation
Download Presentation
Query, Analysis, and Visualization of Hierarchically Structured Data using Polaris

Loading in 2 Seconds...

play fullscreen
1 / 35

Query, Analysis, and Visualization of Hierarchically Structured Data using Polaris - PowerPoint PPT Presentation


  • 161 Views
  • Uploaded on

Query, Analysis, and Visualization of Hierarchically Structured Data using Polaris. Chris Stolte, Diane Tang, Pat Hanrahan July 2002. Motivation. Large databases have become very common Corporate data warehouses Amazon, Walmart,… Scientific projects: Human Genome Project

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Query, Analysis, and Visualization of Hierarchically Structured Data using Polaris' - charo


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
query analysis and visualization of hierarchically structured data using polaris

Query, Analysis, and Visualization of Hierarchically Structured Data using Polaris

Chris Stolte, Diane Tang, Pat Hanrahan

July 2002

motivation
Motivation
  • Large databases have become very common
    • Corporate data warehouses
      • Amazon, Walmart,…
    • Scientific projects:
      • Human Genome Project
      • Sloan Digital Sky Survey
  • Need tools to extract meaning from these databases
    • Programmatic data mining/statistical analysis
    • Visual exploration and analysis
hierarchical structure
Hierarchical Structure
  • Challenge: these databases are very large
    • Queries can not visit every record
    • Visualizations can not display every record
  • Analysts have augmented databases with hierarchical structure
    • Provide meaningful levels of abstraction
    • Leveraged by both computer and analyst
    • Derived from semantics or programmatic analysis
  • Tools need to take advantage of these hierarchies
contributions
Contributions
  • Interactive tool for analysis of data warehouses with hierarchical structure
    • Based on Polaris*
      • Rapid construction of table-based visualizations
      • Algebraic formalism
      • Analysis of flat relational databases
    • To support hierarchies, we need to extend:
      • User interface
      • Algebraic formalism
      • Generation of data queries

* C. Stolte, D. Tang, and P. Hanrahan. Polaris: A System for Query, Analysis, and Visualization of Multi-dimensional Relational Databases. In IEEE Transactions on Visualization and Computer Graphics, January 2002.

outline
Outline
  • Review of Polaris
    • Demo
    • Formalism
  • Hierarchies and Data Cubes
  • Extensions to Polaris
    • Demo
    • Formalism
  • Discussion
schema denormalized relation
Schema: Denormalized Relation

Market

State

Year

Quarter

Month

Product Type

Product

Profit

Sales

Payroll

Marketing

Inventory

Margin

COGS

...

Hypothetical nation-widecoffee chain data(courtesy Visual Insights)

Ordinal fields

(categorical)

Quantitative fields

(metrics)

polaris review
Polaris Review
  • Provide an interface for rapidly and incrementally generating table-based graphical displays
  • Users construct visualizations via a drag-and-drop interface
  • Queries are automatically generated
  • Interface is simple and expressive because built upon a formalism
polaris formalism
Polaris Formalism
  • UI interpreted as visual specification that defines:
    • table configuration
    • type of graphic in each pane
    • encoding of data as visual properties of marks
    • data transformations
  • Specification automatically compiled into necessary queries & drawing commands
polaris formalism10
Polaris Formalism
  • UI interpreted as visual specification that defines:
    • table configuration
    • type of graphic in each pane
    • encoding of data as visual properties of marks
    • data transformations
  • Specification automatically compiled into necessary queries & drawing commands
specifying table configurations
Specifying Table Configurations
  • Interface: define table configuration by dropping fields on shelves
  • Formalism: shelf content interpreted as expressions in table algebra
table algebra
Table Algebra
  • Operands are the database fields
    • each operand interpreted as a set {…}
    • quantitative and ordinal fields interpreted differently
  • Three operators:
    • concatenation (+), cross (X), nest (/)
table algebra operands
Table Algebra: Operands
  • Ordinal fields: interpret domain as a set that partitions table into rows and columns:

Quarter = {(Qtr1),(Qtr2),(Qtr3),(Qtr4)} 

  • Quantitative fields: treat domain as single element set and encode spatially as axes:

Profit = {(Profit[-410,650])} 

concatenation operator

Quarter + ProductType

= {(Qtr1),(Qtr2),(Qtr3),(Qtr4)} + {(Coffee), (Espresso)}

= {(Qtr1),(Qtr2),(Qtr3),(Qtr4),(Coffee),(Espresso)}

Concatenation (+) operator
  • Ordered union of set interpretations:

Profit + Sales = {(Profit[-310,620]),(Sales[0,1000])}

cross x operator
Cross (x) operator
  • Cross-product of set interpretations:

Quarter x ProductType =

{(Qtr1,Coffee), (Qtr1, Tea), (Qtr2, Coffee), (Qtr2, Tea), (Qtr3, Coffee), (Qtr3, Tea), (Qtr4, Coffee), (Qtr4,Tea)}

ProductType x Profit =

nest operator
Nest (/) operator
  • Quarter x Month
    • would create entry twelve entries for each quarter. i.e., (Qtr1, December)
  • Quarter / Month
    • would only create three entries per quarter
    • based on tuples in database not semantics
    • can be expensive to compute
outline17
Outline
  • Review of Polaris
    • Demo
    • Formalism
  • Hierarchies and Data Cubes
  • Extensions to Polaris
    • Demo
    • Formalism
  • Discussion
data cubes

Each cell summarizes all measures for those dimension values

Each cube dimension corresponds to a dimension in the relation

Data Cubes
  • Structure relation as n-dimensional cube
hierarchies and data cubes
Hierarchies and Data Cubes
  • Each dimension in the cube is structured as a tree
  • Each level in tree corresponds to level of detail
  • Nodes correspond to domain values
hierarchies and data cubes20
Hierarchies and Data Cubes
  • Some hierarchies known apriori
    • Provide semantic meaning
    • Time (day, month, year)Location (city, state, country)
  • Can be automatically generated
    • Classification algorithms
    • Clustering
  • Enable analyst to reason at high level of abstraction then drill down
    • Interface must expose underlying hierarchical structure
hierarchy model
Hierarchy Model
  • Our model assumes that hierarchies:
    • Can be modeled using star or snowflake schema
    • Have uniform depth
    • Have homogenous node types
  • Other models relax these constraints
  • Chose to focus on model commonly found in commercial data warehouse and data cube products
outline22
Outline
  • Review of Polaris
    • Demo
    • Formalism
  • Hierarchies and Data Cubes
  • Extensions to Polaris
    • Demo
    • Formalism
  • Discussion
schema star schema
Schema: Star Schema

Dimension Table

Fact table

Time

Year

Quarter

Month

Location

Market

State

State

Month

Product

Profit

Sales

Payroll

Marketing

Inventory

Margin

COGS

...

Products

Product Type

Product Name

Measures

extending the formalism
Extending the Formalism
  • Redefine operands as dimension levels and measures not simply database fields
  • Need to define set interpretation of a dimension level
    • Domain is not a single ordered list
    • Composed of node values at particular level in hierarchy
    • Node values are uniquely defined by the path from root node
  • Possible definitions?
set interpretation option 1
Set Interpretation: Option 1
  • Define set interpretation by listing each node value with unique path to root:

{1998.Qtr1.Jan, …., 1998.Qtr4.Dec}

(+) Provides unique set interpretation

(-) Limits expressiveness

    • Any table including “Months” must include “Year”
    • Not possible to summarize across years (e.g., Total Sales in January for all Years)
    • Not a standard projection of data cube but very useful
set interpretation option 2
Set Interpretation: Option 2
  • Define set interpretation by listing each node value without path to root:

{Jan, Feb, …., Dec}

  • Order by depth first traversal
  • Consolidate non-unique values

This works—but how do we leverage known relationship between dimension levels?

dot operator
Dot (.) Operator
  • Nest isn’t aware of defined hierarchical relationships:
    • Year / Months might work—if all data present
    • Inefficient
  • New operator: Dot (.)
    • Nest computed using the dimension table rather then the fact table
  • Sufficient to provide support for aggregation, drill down, and roll up in algebra.
generating queries
Generating Queries
  • Queries generated from specification.
  • Panes correspond to either a slice of a projection or an aggregation of a projection.
  • Multiple queries required if level-of-detail varies.
  • Algebraic manipulation can be used to determine minimal set of queries.
  • Interpreter generates SQL, MDX, or Rivet queries.
related visualization projects
Related Visualization Projects
  • Formalisms for Graphics
      • Wilkinson’s Grammar of Graphics
      • Bertin’s Semiology of Graphics
      • Mackinlay’s APT
  • Visual Exploration of Databases
      • VQE, DeVise, Visage, DataSplash/Tioga-2,…
  • Visualization and Data Mining
      • MineSet, …
data mining and visualization
Data Mining and Visualization
  • Polaris not solely for visual analysis
    • Precursor to algorithmic analysis to identify areas of interest
    • Validate results and establish trust and understanding
    • Incorporate decision trees and classification algorithms into data warehouses as hierarchies
summary
Summary
  • Extended Polaris to fully support and expose hierarchical structure of data cubes
  • Extended not only interface but underlying algebraic formalism
future work
Future Work
  • Use underlying formalism as basis for other visualization tools
    • Interactive pan-and-zoom systems
future work34
Future Work
  • Visual presentation of metadata
    • Hierarchies are one example of rich, domain specific metadata
    • As important to analysis as data itself
    • How to visualize this metadata?
future work35
Future Work
  • Interactive visualization
  • Prefetching and Caching