Visdb database exploration using multidimensional visualization
1 / 43

VisDB : Database exploration using Multidimensional Visualization - PowerPoint PPT Presentation

  • Uploaded on

VisDB : Database exploration using Multidimensional Visualization. Daniel A. Keim, Hans-Peter Kriegel Institute for Computer Science, University of Munich. Created By. Rohan Ladkhedkar Ajinkya Raulkar Vrushali Date Anuja Surgude. Contents. Introduction to VisDB Basic Idea of VisDB

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' VisDB : Database exploration using Multidimensional Visualization' - edolie

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Visdb database exploration using multidimensional visualization

VisDB: Database exploration using Multidimensional Visualization

Daniel A. Keim, Hans-Peter Kriegel

Institute for Computer Science, University of Munich

Created by
Created By

  • RohanLadkhedkar

  • AjinkyaRaulkar

  • Vrushali Date

  • AnujaSurgude


  • Introduction to VisDB

  • Basic Idea of VisDB

  • Techniques used

    • Basic Visualization

    • Mapping 2D to Axis

    • Grouping the Dimensions

  • Working

  • Hardware/Software

  • Future Scope

  • Conclusion

Introduction to visdb
Introduction to VisDB

Typical difficulties faced with large databases:

  • Finding a specific data

  • No knowledge about database systems, query language and data model

  • Intersection data spots

  • 1 to 1 queries provide multiple data items with no feedback

Introduction to visdb1
Introduction to VisDB

  • Sorting the data items according to user query.

  • Visualizing as many data items as possible (Suppose in Ten Million) at the same time to give the user some kind of feedback on his query.

  • Also the resolution of current displays(1 to 3 million pixels) is an important consideration.

  • Interaction of the system with user.

Basic idea of visdb
Basic Idea of VisDB

  • Support Query Specification process by visually representing the result.

  • Restricts the visualized dimensions which are of no interest to users.

Basic idea of visdb1
Basic Idea of VisDB

  • Each pixel of screen is used to visualize the data items resulting from a query.

  • Approximate results are determined using distance functions.

  • These distances are then combined to get relevance factor which is useful for mapping.

Distance function
Distance Function

  • The distance between attribute and corresponding query value is determined.

  • Distance function used here are data type and application dependent.

  • In some cases, even for a single data type multiple distance functions can be used.

  • Calculating distance functions for

  • Number types(Integer) – Numerical difference.

  • Ordinal types(Grades) – domain specific distance functions

  • Nominal Types(Professionals) – Distance matrix

Combining distances into relevance factor
Combining Distances into Relevance Factor

  • Combine independently calculated distances of the different selection predicates.

  • But it should have a global meaning.

  • User interaction required.

    Obtain weighting factors (Wj, j Є 1, ……, #sp) as per order of importance from users.

  • Normalization of all distances.

    Linear transformation of the range [dmax,dmin] for each predicate

    e.g. (0,255)

Combining distances into relevance factor1
Combining Distances into Relevance Factor

  • For combining the normalized distances we use numerical mean functions such as :

    1. Weighted arithmetic mean for ‘AND’ – connected condition part.

  • Weighted geometric mean for ‘OR’- connected condition part.

    Relevance factor is inverse of distance value

Reducing the amount of data to be displayed
Reducing the amount of data to be displayed

  • Adequate heuristics are required to:

  • Reduce amount of data

  • Determine data items whose distances are to be displayed.

  • Hence α-quantile is defined as lowest value ξα such that:

Techniques used
Techniques Used

  • 3 techniques are used

  • Basic Visualization Technique

  • Mapping two dimensions to the Axes

  • Grouping the dimensions for each data Item

1 basic visualization technique
1. Basic Visualization Technique

  • Sorts data according to relevance with respect to query.

  • Then maps the relevance factors to colors.

  • Sorting is needed to avoid sprinkled images (which are not clear to user).

  • Highest Relevance factors centered to middle of window

  • Approximate answers create a rectangular spiral around this region(100% correct answers are yellow in color).

1 basic visualization technique1
1. Basic Visualization Technique

  • Color ranges from Yellow in middle to green, blue, red and lastly black

  • These ranges denote the distance from correct answers.

1 basic visualization technique2
1. Basic Visualization Technique

  • Multidimensional Visualization -

    In this we generate a separate window for each selection predicate of the query.

Question 1
Question 1:

  • 100% correct answers are denoted by which color in Basic Visualization Technique?

  • Red

  • Yellow

  • Green

  • White

  • Blue

Answer 1
Answer 1:

  • Correct answer: 2

2 mapping two dimensions to axes
2. Mapping Two Dimensions to Axes

  • Reasons for not pursuing 2D-3D visualizations although they are useful is because of

    • Limited Number of data items.

    • Systems already exist.

  • Improvement – Providing feedback on the direction of the distance into visualization.

2 mapping two dimensions to axes1
2. Mapping Two Dimensions to Axes

  • Assign two dimensions to the axes

  • Arrange the relevance factor according to the direction of the distance.

  • For 1 dimension, arrangement is

    Negative distances to left,

    Positive distances to right,

    For other dimension

    Negative distances to bottom,

    Positive ones to top

Problems in this method
Problems in this method

  • Corner of window would be completely empty.

  • Worst case- 2 diagonally opposite corners of the window may be completely empty which results in only half data items to be presented

  • Maximizing the number of data item conflict with arrangements that have multiple dimensions assigned to axis.

Question 2
Question 2:

  • In 1 Dimension Negative distances are arranged

  • 1) at the bottom

  • 2) to the right

  • 3) at the top

  • 4) to the left

Answer 2
Answer 2:

  • Correct answer: 4

3 grouping the dimensions for each data item
3. Grouping the Dimensions for each Data Item

  • All dimensions for one data item are grouped together in one area.

  • Visualizations generated using this arrangement consists of only one window.

  • We do not focus on shape to distinguish data items, and the criterion and arrangement of the data items is also different.

  • 2x2 pixels per dimension needed as opposed to 1 pixel per dimension in previous 2 methods.


  • Grouping arrangement is only suitable for focused search on smaller data sets because only one-fourth of the data items can be displayed on screen at one point of time.

  • But still provides more visualizations for data sets with larger dimensionality.

  • In other two techniques the pixels for each dimension of the data items are only related by their position.


  • Divided into the Visualization portion on left and Query Modification on right.

  • In Visualization portion the resulting data set including a certain percentage of approximate answers is displayed by using one of the visualization methods.

  • In Query Modification the sliders for modifying the selection predicates and weighting factors as well as some other options are provided.

Working contd
Working contd..

  • Different kind of sliders are there.

  • Ex: Sliders for numbers, sliders for discrete types, sliders for non-metric types(ordinal and nominal data types)

  • Other parameters listed are

    • Number of results

    • Query range

    • Weighting factors

    • Data values for selected tuple

    • Data values corresponding to some selected color range

Working contd1
Working contd..

  • Changing the percentage of data being displayed may completely change the visualization as distance values are normalized according to new range.

  • Normal Mode - System recalculates the visualization after each modification of query.

  • Auto-Recalculate Off mode – Queries are only recalculated on demand.

Question 3
Question 3:

  • In which two sections is VisDB mainly divided??

  • Visualization Portion

  • Grouping Dimentions

  • Query Modification

  • Coloration of Relevance factors

Answer 3
Answer 3:

  • Correct answer: 1 and 3

Question 4
Question 4

  • In which mode does the system recalculates the visualization after each modification of query?

  • Normal Mode

  • Auto Recalculate Mode

  • Visual Mode

  • None of the above.

Answer 4
Answer 4:

  • Correct answer: 1

Hardware software

  • Software used

    • C++

    • MOTIF

  • Hardware used

    • X- Windows on HP 7xx machines(Current version is main memory based and allows interaction data base exploration for database containing 50,000 data items)

Future scope
Future Scope

  • Automatic generation of queries that correspond to some specific region in one of the visualization windows.

  • Generate time series of visualizations corresponding to queries that are changed incrementally.

  • Applying to many different application domains each having its own parameters, distance functions, query requirements and so on.


  • This VisDB allows visualization of the largest amount of data that can be displayed at one point of time on current display.

  • Provides valuable feedback in querying the database

  • Allows the user to find results which would other wise remain hidden in database.