1 / 61

General problem

General problem. Retrieval of time-series similar to a given pattern . Example: Stock charts. Database of time-series. Example: Stock charts. Database of time-series. Pattern. Example: Stock charts. Database of time-series. Pattern. Retrieval results. Example: Stock charts.

ezhno
Download Presentation

General problem

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. General problem Retrieval of time-series similar to a givenpattern.

  2. Example: Stock charts Database of time-series

  3. Example: Stock charts Database of time-series Pattern

  4. Example: Stock charts Database of time-series Pattern Retrieval results

  5. Example: Stock charts Database of time-series Pattern Retrieval results .92 .87 .86 .84

  6. Example: Electrocardiogram Database of time-series

  7. Example: Electrocardiogram Database of time-series Pattern

  8. Example: Electrocardiogram Database of time-series Pattern Retrieval results .91 .87 .98 1.0

  9. Outline • Previous work • Important points • Indexing and retrieval • Empirical results • Conclusions

  10. Outline • Previous work • Important points • Indexing and retrieval • Empirical results • Conclusions } Contributions

  11. Criteria for retrieval methods Gunopulos [2000]: • Work for erratic time-series • Accept any pattern • Find inexact matches • Work when some points are missing • Work on streaming data

  12. Outline • Previous work • Important points • Indexing and retrieval • Empirical results • Conclusions

  13. Previous work • Feature choice • Similarity metrics • Indexing and retrieval

  14. Previous work: Feature choice • Discrete Fourier transforms • Alphabets • Statistical features • Subsets of points

  15. Previous work: Similarity metrics • Euclidean distance • Bounding rectangles • Envelope count • Aggregate similarity

  16. Previous work: Indexing and retrieval • Advanced techniques: • B-trees • R-trees • KD-trees • VP-trees • Grids • Applied techniques: • Linear search with compression

  17. Outline • Previous work • Important points • Indexing and retrieval • Empirical results • Conclusions

  18. Important points Choose “important” maxima and minima, and discard the other points.

  19. Important points Choose “important” maxima and minima, and discard the other points. Example: Original series

  20. Important points Choose “important” maxima and minima, and discard the other points. Example: Original series

  21. Important points Choose “important” maxima and minima, and discard the other points. Example: Compressed series Original series

  22. Definition of important points Important minimum

  23. Definition of important points Important minimum • am is the minimum among ai,…, aj

  24. Definition of important points Important minimum • am is the minimum among ai,…, aj • ai/am  R andaj/am  R

  25. Definition of important points Important minimum • am is the minimum among ai,…, aj • ai/am  R andaj/am  R • R is a knob that determines compression rate

  26. Definition of important points Important maximum • am is the maximum among ai,…, aj • am/ai  R andam/aj  R • R is a knob that determines compression rate

  27. Compression example Originalseries

  28. Compression example Originalseries Compressed series

  29. Compression example Originalseries Compressed series

  30. Compression example Originalseries Compressed series

  31. Compression algorithm • Linear time • Constant memory • Accepts streaming data • For a series with n values, compression time is 0.0133 n milliseconds (300 MHz PC, Visual Basic 6.0).

  32. Outline • Previous work • Important points • Indexing and retrieval • Empirical results • Conclusions

  33. Retrieval • Retrieval of time-series similar to a given pattern. • Intuition: • Find a prominent feature in the pattern • Find candidate segments with a similar feature • Compare similarity of candidates to the pattern

  34. Example: Stock charts Database of time-series

  35. Example: Stock charts Database of time-series

  36. Example: Stock charts Database of time-series Pattern

  37. Example: Stock charts Database of time-series Pattern

  38. Example: Stock charts Database of time-series Pattern

  39. Example: Stock charts Database of time-series Pattern Retrieval results .92 .87 .86 .84

  40. Algorithm • Identify the prominent leg in the pattern • Retrieve similar legs from the database • Identify corresponding candidate segments • For each candidate segment, compute its similarity to the pattern • Output the candidates whose similarity is above the threshold

  41. Important details • Use compressed pattern and compressed sequences in the retrieval process • The prominent feature is the leg having the greatest ratio of right end to left end • All legs in the database are indexed by their prominence, using a binary search tree

  42. Alternative versions • Different prominence definitions • Different similarity metrics • The end-point ratio prominence usually gives the best empirical results.

  43. Extended legs Similar sequence

  44. Indexing on extended legs • Advantage: More accurate retrieval • Disadvantage: Larger index, more memory • If a compressed sequence has n legs: • Worst case: n2/2 extended legs • Average case: (n  lg n) extended legs

  45. Outline • Previous work • Important points • Indexing and retrieval • Empirical results • Conclusions

  46. Data sets • Stock charts • Air and sea temperatures • Wind speeds • Electroencephalograms • Electrocardiograms

  47. Data sets • Stock charts • Air and sea temperatures • Wind speeds • Electroencephalograms • Electrocardiograms 60,000 points 445,000 points 79,000 points 17,000 points 2,000 points

  48. Patterns Compressed patterns with 4 to 27 legs Examples:

  49. Retrieval time Retrieval time: 0.07  m k milliseconds m legs in a pattern k candidates

More Related