- 112 Views
- Uploaded on
- Presentation posted in: General

VisDB : Database exploration using Multidimensional Visualization

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

VisDB: Database exploration using Multidimensional Visualization

Daniel A. Keim, Hans-Peter Kriegel

Institute for Computer Science, University of Munich

- RohanLadkhedkar
- AjinkyaRaulkar
- Vrushali Date
- AnujaSurgude

- Introduction to VisDB
- Basic Idea of VisDB
- Techniques used
- Basic Visualization
- Mapping 2D to Axis
- Grouping the Dimensions

- Working
- Hardware/Software
- Future Scope
- Conclusion

Typical difficulties faced with large databases:

- Finding a specific data
- No knowledge about database systems, query language and data model
- Intersection data spots
- 1 to 1 queries provide multiple data items with no feedback

- Sorting the data items according to user query.
- Visualizing as many data items as possible (Suppose in Ten Million) at the same time to give the user some kind of feedback on his query.
- Also the resolution of current displays(1 to 3 million pixels) is an important consideration.
- Interaction of the system with user.

- Support Query Specification process by visually representing the result.
- Restricts the visualized dimensions which are of no interest to users.

- Each pixel of screen is used to visualize the data items resulting from a query.
- Approximate results are determined using distance functions.
- These distances are then combined to get relevance factor which is useful for mapping.

- The distance between attribute and corresponding query value is determined.
- Distance function used here are data type and application dependent.
- In some cases, even for a single data type multiple distance functions can be used.
- Calculating distance functions for
- Number types(Integer) – Numerical difference.
- Ordinal types(Grades) – domain specific distance functions
- Nominal Types(Professionals) – Distance matrix

- Combine independently calculated distances of the different selection predicates.
- But it should have a global meaning.
- User interaction required.
Obtain weighting factors (Wj, j Є 1, ……, #sp) as per order of importance from users.

- Normalization of all distances.
Linear transformation of the range [dmax,dmin] for each predicate

e.g. (0,255)

- For combining the normalized distances we use numerical mean functions such as :
1. Weighted arithmetic mean for ‘AND’ – connected condition part.

- Weighted geometric mean for ‘OR’- connected condition part.
Relevance factor is inverse of distance value

- Adequate heuristics are required to:
- Reduce amount of data
- Determine data items whose distances are to be displayed.
- Hence α-quantile is defined as lowest value ξα such that:

- 3 techniques are used
- Basic Visualization Technique
- Mapping two dimensions to the Axes
- Grouping the dimensions for each data Item

- Sorts data according to relevance with respect to query.
- Then maps the relevance factors to colors.
- Sorting is needed to avoid sprinkled images (which are not clear to user).
- Highest Relevance factors centered to middle of window
- Approximate answers create a rectangular spiral around this region(100% correct answers are yellow in color).

- Color ranges from Yellow in middle to green, blue, red and lastly black
- These ranges denote the distance from correct answers.

- Multidimensional Visualization -
In this we generate a separate window for each selection predicate of the query.

- 100% correct answers are denoted by which color in Basic Visualization Technique?
- Red
- Yellow
- Green
- White
- Blue

- Correct answer: 2

- Reasons for not pursuing 2D-3D visualizations although they are useful is because of
- Limited Number of data items.
- Systems already exist.

- Improvement – Providing feedback on the direction of the distance into visualization.

- Assign two dimensions to the axes
- Arrange the relevance factor according to the direction of the distance.
- For 1 dimension, arrangement is
Negative distances to left,

Positive distances to right,

For other dimension

Negative distances to bottom,

Positive ones to top

- Corner of window would be completely empty.
- Worst case- 2 diagonally opposite corners of the window may be completely empty which results in only half data items to be presented
- Maximizing the number of data item conflict with arrangements that have multiple dimensions assigned to axis.

- In 1 Dimension Negative distances are arranged
- 1) at the bottom
- 2) to the right
- 3) at the top
- 4) to the left

- Correct answer: 4

- All dimensions for one data item are grouped together in one area.
- Visualizations generated using this arrangement consists of only one window.
- We do not focus on shape to distinguish data items, and the criterion and arrangement of the data items is also different.
- 2x2 pixels per dimension needed as opposed to 1 pixel per dimension in previous 2 methods.

- Grouping arrangement is only suitable for focused search on smaller data sets because only one-fourth of the data items can be displayed on screen at one point of time.
- But still provides more visualizations for data sets with larger dimensionality.
- In other two techniques the pixels for each dimension of the data items are only related by their position.

- Divided into the Visualization portion on left and Query Modification on right.
- In Visualization portion the resulting data set including a certain percentage of approximate answers is displayed by using one of the visualization methods.
- In Query Modification the sliders for modifying the selection predicates and weighting factors as well as some other options are provided.

- Different kind of sliders are there.
- Ex: Sliders for numbers, sliders for discrete types, sliders for non-metric types(ordinal and nominal data types)
- Other parameters listed are
- Number of results
- Query range
- Weighting factors
- Data values for selected tuple
- Data values corresponding to some selected color range

- Changing the percentage of data being displayed may completely change the visualization as distance values are normalized according to new range.
- Normal Mode - System recalculates the visualization after each modification of query.
- Auto-Recalculate Off mode – Queries are only recalculated on demand.

- In which two sections is VisDB mainly divided??
- Visualization Portion
- Grouping Dimentions
- Query Modification
- Coloration of Relevance factors

- Correct answer: 1 and 3

- In which mode does the system recalculates the visualization after each modification of query?
- Normal Mode
- Auto Recalculate Mode
- Visual Mode
- None of the above.

- Correct answer: 1

- Software used
- C++
- MOTIF

- Hardware used
- X- Windows on HP 7xx machines(Current version is main memory based and allows interaction data base exploration for database containing 50,000 data items)

- Automatic generation of queries that correspond to some specific region in one of the visualization windows.
- Generate time series of visualizations corresponding to queries that are changed incrementally.
- Applying to many different application domains each having its own parameters, distance functions, query requirements and so on.

- This VisDB allows visualization of the largest amount of data that can be displayed at one point of time on current display.
- Provides valuable feedback in querying the database
- Allows the user to find results which would other wise remain hidden in database.

Thank you