i Tree: Exploring TimeVarying Data using Indexable Tree. Yi Gu and Chaoli Wang Michigan Technological University Presented at IEEE Pacific Visualization Symposium 28 February 2013 Sydney, Australia. Timeactivity curve (TAC) Timevarying medical imaging data [Fang et al. 2007]
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
iTree: Exploring TimeVarying Data using Indexable Tree
Yi Gu and Chaoli Wang
Michigan Technological University
Presented at IEEE Pacific Visualization Symposium
28 February 2013
Sydney, Australia
Timeactivity curve (TAC)
Timevarying medical imaging data [Fang et al. 2007]
Importance analysis
Multiscale data clustering
Temporal sequencing
Trend identification
What iTree can do for us?
Handle evergrowing size and complexity (efficient data compacting)
Index and query TACs adaptively (effective data indexing)
Interact with spacetime data (intuitive visual exploration)
TACbased timevarying data visualization
Keogh’s SIGKDD 2007 tutorial slide
c
c
c
b
b
b
a
a


0
0
40
60
80
100
120
20
C
C
0
20
40
60
80
100
120
First convert the time series to piecewise aggregate approximation(PAA) representation, then convert the PAA to symbols
It takes linear time [Lin et al. 2003]
breakpoints
SAX word can be represented by symbols (e.g., a, b, c) or bits (e.g., 00, 01, 10 or 02, 12, 22)
baabccbc
word length: 8; bit cardinality: 2
Handle timevarying data
Use group of voxels over time intervals by going through voxel by voxel for the 1st time step, then the 2nd etc.
Modify the original SAX/iSAX algorithms to
Better differentiate SAX words (effectiveness)
Improve computational performance (efficiency)
Make iSAX amenable for visual mapping (visualization)
PAA conversion
Convert a TAC T of length n to a PAA C of length w
SAX for timevarying volume data (1)
Transfer function based breakpoint identification
H’: histogram after logarithm and normalization of the original histogram
H: new histogram by multiplying H’ by the opacity value
SAX for timevarying volume data (2)
After
Before
SAX word generation
Construct an alphabet Φ and transform C into an array of symbol Ĉ to form a SAX word
Distance between two symbols
Distance between two SAX words
Distance between two SAX words is the lower bound of the Euclidean distance defined based on the PAA representation
SAX for timevarying volume data (3)
DLB(Q’,S’)
D(Q,S)
SAX lower bounding
Exact (Euclidean) distance D(Q,S)
Lower bounding distance DLB(Q,S)
Raw data
Approx. resp.
Q’
Q
S’
S
DLB(Q’,S’)
D(Q,S)
Lower bounding means that for all Q and S, we have…
DLB(Q’,S’) D(Q,S)
Keogh’s SIGKDD 2007 tutorial slide
SAX construction (in sec)
Choose 8 to 12 word length and 16 to 32 quantization level are appropriate for quality and speed tradeoff
Less than 10 minutes to construct SAX excluding I/O time
iSAX organizes SAX words hierarchically
A node represents a set of TACs with the same or similar SAX words
Split a node when the number of SAX words exceeds a certain threshold
How to split?
The original iSAX chooses the symbol with the leftmost smallest bit cardinality to split
We choose a symbol covering the largest value rangeto split
iSAX for timevarying volume data (1)
Comparison
Original breakpoint identification and symbol splitting
Our new breakpoint identification and symbol splitting
iSAX construction
Voxel IDs for each terminal node are saved into a file
Use the SAX word itself as the file name to facilitate search
Outofcore acceleration strategy
Partition all voxels or groups into at most 2w buckets and save each nonempty bucket into a file
Choose the file with the largest voxel/group count to split if larger than a threshold δn
Continue this until no file is larger than δn
iSAX for timevarying volume data (2)
Approximate and exact search
Both take the PAA representation and a threshold δ as input
Approximate search only compares each of the file names with the PAA converted SAX word if the distance is less than δ
Exact search needs an additional step: compute PAAbased distance to the input PAA and return those voxels that have a distance less than δ
iSAX for timevarying volume data (3)
From iSAX (internal) hierarchy to iTree (external)
Number of nonempty children of the root is fairly large
Solution: level promoting
iSAX has a larger number of hierarchy with small fanout (2)
Solution: sibling grouping
Sibling nodes are not arranged according to their similarity
Solution: sibling reordering
Resulting properties
The height of the iTree is determined by the maximal bit cardinality for representing any symbol in the SAX words
The iTree is balanced: no node has an excessively large fanout
Neighboring sibling nodes have a higher degree of similarity in terms of spatial closeness and temporal trend
iTree (1)
iTree drawing and focus+context visualization
Hyperbolic layout [Laming and Rao 1996]
Accommodate a large number of nodes
Allow focus+context interaction
Add the time ring to indicate the time dimension
Query in multiple coordinated views (volume view, iTree view and SAX view)
iTree (2)
iSAX/iTree construction (in sec)
Reduce the number of nodes an order of magnitude smaller from iSAX to iTree
Bruteforce/approx./exact search (in sec)
Bruteforce search does not use any indexing scheme but simply goes over the PAA representation of data for identifying similar voxels
The time cost for approx. search does not increase much from current interval to all time steps (only involving using the names of index files for distance computation)
iTree
Data organization, visual representation and user interaction framework for timevarying data analysis and visualization
Applicable for tackling big timevarying data sets
Limitations
Breakpoint identification depends on input transfer function
Blockwise TACs lead to block discontinuity in data classification
Future work
Motif finding (locate previously unknown, frequently occurring patterns)
Timevarying multivariate data
Acknowledgements
U.S. National Science Foundation
Summary