Visdb database exploration using multidimensional visualization
Sponsored Links
This presentation is the property of its rightful owner.
1 / 43

VisDB : Database exploration using Multidimensional Visualization PowerPoint PPT Presentation


  • 112 Views
  • Uploaded on
  • Presentation posted in: General

VisDB : Database exploration using Multidimensional Visualization. Daniel A. Keim, Hans-Peter Kriegel Institute for Computer Science, University of Munich. Created By. Rohan Ladkhedkar Ajinkya Raulkar Vrushali Date Anuja Surgude. Contents. Introduction to VisDB Basic Idea of VisDB

Download Presentation

VisDB : Database exploration using Multidimensional Visualization

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


VisDB: Database exploration using Multidimensional Visualization

Daniel A. Keim, Hans-Peter Kriegel

Institute for Computer Science, University of Munich


Created By

  • RohanLadkhedkar

  • AjinkyaRaulkar

  • Vrushali Date

  • AnujaSurgude


Contents

  • Introduction to VisDB

  • Basic Idea of VisDB

  • Techniques used

    • Basic Visualization

    • Mapping 2D to Axis

    • Grouping the Dimensions

  • Working

  • Hardware/Software

  • Future Scope

  • Conclusion


Introduction to VisDB

Typical difficulties faced with large databases:

  • Finding a specific data

  • No knowledge about database systems, query language and data model

  • Intersection data spots

  • 1 to 1 queries provide multiple data items with no feedback


Introduction to VisDB

  • Sorting the data items according to user query.

  • Visualizing as many data items as possible (Suppose in Ten Million) at the same time to give the user some kind of feedback on his query.

  • Also the resolution of current displays(1 to 3 million pixels) is an important consideration.

  • Interaction of the system with user.


Basic Idea of VisDB

  • Support Query Specification process by visually representing the result.

  • Restricts the visualized dimensions which are of no interest to users.


Basic Idea of VisDB

  • Each pixel of screen is used to visualize the data items resulting from a query.

  • Approximate results are determined using distance functions.

  • These distances are then combined to get relevance factor which is useful for mapping.


Distance Function

  • The distance between attribute and corresponding query value is determined.

  • Distance function used here are data type and application dependent.

  • In some cases, even for a single data type multiple distance functions can be used.

  • Calculating distance functions for

  • Number types(Integer) – Numerical difference.

  • Ordinal types(Grades) – domain specific distance functions

  • Nominal Types(Professionals) – Distance matrix


Combining Distances into Relevance Factor

  • Combine independently calculated distances of the different selection predicates.

  • But it should have a global meaning.

  • User interaction required.

    Obtain weighting factors (Wj, j Є 1, ……, #sp) as per order of importance from users.

  • Normalization of all distances.

    Linear transformation of the range [dmax,dmin] for each predicate

    e.g. (0,255)


Combining Distances into Relevance Factor

  • For combining the normalized distances we use numerical mean functions such as :

    1. Weighted arithmetic mean for ‘AND’ – connected condition part.

  • Weighted geometric mean for ‘OR’- connected condition part.

    Relevance factor is inverse of distance value


Formula for calculating combined distance


Reducing the amount of data to be displayed

  • Adequate heuristics are required to:

  • Reduce amount of data

  • Determine data items whose distances are to be displayed.

  • Hence α-quantile is defined as lowest value ξα such that:


Techniques Used

  • 3 techniques are used

  • Basic Visualization Technique

  • Mapping two dimensions to the Axes

  • Grouping the dimensions for each data Item


1. Basic Visualization Technique

  • Sorts data according to relevance with respect to query.

  • Then maps the relevance factors to colors.

  • Sorting is needed to avoid sprinkled images (which are not clear to user).

  • Highest Relevance factors centered to middle of window

  • Approximate answers create a rectangular spiral around this region(100% correct answers are yellow in color).


1. Basic Visualization Technique

  • Color ranges from Yellow in middle to green, blue, red and lastly black

  • These ranges denote the distance from correct answers.


1. Basic Visualization Technique

  • Multidimensional Visualization -

    In this we generate a separate window for each selection predicate of the query.


Question 1:

  • 100% correct answers are denoted by which color in Basic Visualization Technique?

  • Red

  • Yellow

  • Green

  • White

  • Blue


Answer 1:

  • Correct answer: 2


2. Mapping Two Dimensions to Axes

  • Reasons for not pursuing 2D-3D visualizations although they are useful is because of

    • Limited Number of data items.

    • Systems already exist.

  • Improvement – Providing feedback on the direction of the distance into visualization.


2. Mapping Two Dimensions to Axes

  • Assign two dimensions to the axes

  • Arrange the relevance factor according to the direction of the distance.

  • For 1 dimension, arrangement is

    Negative distances to left,

    Positive distances to right,

    For other dimension

    Negative distances to bottom,

    Positive ones to top


2D arrangement of 1dimension


Problems in this method

  • Corner of window would be completely empty.

  • Worst case- 2 diagonally opposite corners of the window may be completely empty which results in only half data items to be presented

  • Maximizing the number of data item conflict with arrangements that have multiple dimensions assigned to axis.


Question 2:

  • In 1 Dimension Negative distances are arranged

  • 1) at the bottom

  • 2) to the right

  • 3) at the top

  • 4) to the left


Answer 2:

  • Correct answer: 4


3. Grouping the Dimensions for each Data Item

  • All dimensions for one data item are grouped together in one area.

  • Visualizations generated using this arrangement consists of only one window.

  • We do not focus on shape to distinguish data items, and the criterion and arrangement of the data items is also different.

  • 2x2 pixels per dimension needed as opposed to 1 pixel per dimension in previous 2 methods.


Grouping arrangement for 5 Dimensional Data


Contd…

  • Grouping arrangement is only suitable for focused search on smaller data sets because only one-fourth of the data items can be displayed on screen at one point of time.

  • But still provides more visualizations for data sets with larger dimensionality.

  • In other two techniques the pixels for each dimension of the data items are only related by their position.


Working

  • Divided into the Visualization portion on left and Query Modification on right.

  • In Visualization portion the resulting data set including a certain percentage of approximate answers is displayed by using one of the visualization methods.

  • In Query Modification the sliders for modifying the selection predicates and weighting factors as well as some other options are provided.


Working contd..

  • Different kind of sliders are there.

  • Ex: Sliders for numbers, sliders for discrete types, sliders for non-metric types(ordinal and nominal data types)

  • Other parameters listed are

    • Number of results

    • Query range

    • Weighting factors

    • Data values for selected tuple

    • Data values corresponding to some selected color range


Working contd..

  • Changing the percentage of data being displayed may completely change the visualization as distance values are normalized according to new range.

  • Normal Mode - System recalculates the visualization after each modification of query.

  • Auto-Recalculate Off mode – Queries are only recalculated on demand.


Question 3:

  • In which two sections is VisDB mainly divided??

  • Visualization Portion

  • Grouping Dimentions

  • Query Modification

  • Coloration of Relevance factors


Answer 3:

  • Correct answer: 1 and 3


Question 4

  • In which mode does the system recalculates the visualization after each modification of query?

  • Normal Mode

  • Auto Recalculate Mode

  • Visual Mode

  • None of the above.


Answer 4:

  • Correct answer: 1


Example(1000 data Items)


Example(1000 data Items)


Example(7000 data Items)


Example(7000 data Items)


Hardware/Software

  • Software used

    • C++

    • MOTIF

  • Hardware used

    • X- Windows on HP 7xx machines(Current version is main memory based and allows interaction data base exploration for database containing 50,000 data items)


Future Scope

  • Automatic generation of queries that correspond to some specific region in one of the visualization windows.

  • Generate time series of visualizations corresponding to queries that are changed incrementally.

  • Applying to many different application domains each having its own parameters, distance functions, query requirements and so on.


Conclusion

  • This VisDB allows visualization of the largest amount of data that can be displayed at one point of time on current display.

  • Provides valuable feedback in querying the database

  • Allows the user to find results which would other wise remain hidden in database.


Thank you


  • Login