1 / 33

Using Shapes of Trends in Active Data Mining

Using Shapes of Trends in Active Data Mining. Duy Lam Norris Boothe. Shape Querying and Active Data Mining. Historical time sequences make up a large portion of data stored in computers Mining trends in histories useful

savea
Download Presentation

Using Shapes of Trends in Active Data Mining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using Shapes of Trends in Active Data Mining Duy Lam Norris Boothe

  2. Shape Querying and Active Data Mining • Historical time sequences make up a large portion of data stored in computers • Mining trends in histories useful • Many applications, including observing trends in stock prices, online bids, and rule mining

  3. Overview • Overview of SDL • SDL language • Applications to data mining

  4. A (Very) Simple History

  5. Shape Definition Language • SDL is a shape definition language used to query the “shapes” of histories • Small, powerful language that allows “blurry” matching • Designed to make it easy and natural to query • Easily implementable • Little non-determinism

  6. Alphabet • SDL allows you to specify an “alphabet” defining transitions • Example:

  7. Describing a shape • So with this alphabet we can describe a shape • Use such a description to query a history to produce all subsequences that match the shape (shape name(parameters)descriptor)

  8. (shape spike() (concat Up up down Down))

  9. Derived Shapes • any • allows a shape to have multiple values • concat • shapes can be concatenated together contiguously (any up Up) (concat down up down up)

  10. Multiple Occurrence Operators • Shapes made of multiple contiguous occurrences of the same shape • Resulting subsequences are such that they are neither preceded nor followed by a subsequence that matches P (exact 5 (any up Up)) (atleast 3 stable) (atmost 2 (concat disappear appear))

  11. Bounded Occurrence Operators • in • permits “blurry” matching by allowing users to state an overall shape without specific details • within the specified time period length, we can have a specified number of occurrences of a shape • can have arbitrary gaps and can have overlap (in 7 (nomore 5 up)) (precisely n P)(noless n P)(nomore n P)

  12. Bounded Occurrence Operators • inorder • specifies shapes that must appear in a specific order (inorder P1 P2 ... Pn)

  13. Shape Definition Examples (in 5 (and (noless 2 (any up Up)) (nomore 1 (any down Down))))

  14. Shape Definition Examples (in 7 (inorder (atleast 2 (any up Up)) (in 4 (noless 3 (any down Down))))))

  15. Parameterized Shapes • Can parameterize shape definitions instead of using concrete values (shape spike(upcnt dncnt) (concat (exact upcnt (any up Up)) (exact dncnt (any down Down)))) (shape doublepeak(width ht1 ht2) (in width (inorder spike(ht1 ht1) spike(ht2 ht2))))

  16. Advantages of SDL • natural and powerful language for expressing shape queries • capability of blurry matching • reduction of output clutter • efficient implementation

  17. SDL’s Expressive Power • SDL is equivalent to regular expressions for regular matching • several features enchance its effectivesness, however • greedy matching and “lookahead” capabilities help reduce output clutter

  18. SDL’s Expressive Power • “blurry” matching enables a much more natural and compact specification of certain shapes • For example, if we wanted precisely one occurrence of each ai in any order • in SDL: • regular expressions requires at least exponential size to specify! (and (precisely 1 a1) (precisely 1 a2) ... (precisely 1 an))

  19. SDL Summary • SDL is a small, powerful language for naturally and intuitively expressing shapes found in histories • Equivalent in power to regular expressions, but much more effective • Permits “blurry” matching

  20. Using SDL inActive Data Mining

  21. Static Data Mining • Discovery of rules for • Associations • Sequences • Classification • Entire data set is mined • Inherent weakness: Rules are not static

  22. Active Data Mining • Partition into time periods • Run data mining algorithm on each period • Gather rules into a ‘rulebase’ • Create triggers to discover • Trends in rules • Associations between rules

  23. Active Data Mining Process Large Data Base Period 1 Rules Period 2 Rules Period 3 Rules

  24. Active Data Mining Process (cont). Selected Rules Shape Definition Language Trigger Definition Language Active Data Mining

  25. Active Data Mining Components • Shape definitions (SDL) • (shape name(parameters) descriptor) • Ex: (shape spike(upcnt dncnt) (concat (atleast upcnt (any up Up)) (atleast dncnt (any down Down)))) • Queries • Triggers

  26. Queries • For rule selection • Syntax: • (query (shape (history-name start-time end-time))) • ‘start’ and ‘end’ specify the end points of history • Result: rules that match the desired shape • Ex:(shape ramp() (concat Up Up)) (query (ramp() (confidence start end)))

  27. Larger Query Example (shape upramp(len cnt) (in len (noless cnt (any up Up)))) (shape dnramp(len cnt) (in len (noless cnt (any down Down)))) (query (and (upramp(5 3) (support start 10)) (dnramp(5 3) (confidence start 10)))) Results: rules where support is increasing but confidence is decreasing

  28. Triggers • Datastream type functionality • ECA (Event Condition Action) model used (Chakravarthy et al. 1989) • Syntax: • (trigger trigger-name (events events-spec) (condition (shape history-spec)) (actions action-spec)) • Events: • Rule creation • History updates

  29. Wave Execution Semantics • Stratified execution of triggers – similar to Datalog Set of Events Triggers for those Events Queries for those Triggers Set of Actions/ Events

  30. Trigger Example • Identifying rules where support is increasing, but confidence is decreasing (trigger detect_up (events updatehistory) (condition (upramp 5 4) (support (- end 5) end))) (actions upward)) (trigger detect_dn (events upward) (condition (dnramp 5 4) (confidence (- end 5) end))) (actions notify))

  31. Implementation • Implemented on AIX system • Part of IBM’s Quest project • Successfully tested: • Large set (5 years) of mail order data (2.9 million records) • Large set (3 years) of POS (point-of-sale) transactions (6.8 million records)

  32. Future Work • At time of paper… • Integrate constructs into a SQL relational system • Improve incremental computations using partial results of current trigger queries • Since then… • Integrated into the Quest Data Mining System • Subsumed into IBM’s data mining products, including Intelligent Miner • Referenced for work in Active Data Mining and “blurry” pattern matching

  33. References • “Querying Shapes of Histories”, by Rakesh Agrawal, Giuseppe Psaila, Edward L. Wimmers, and Mohamed Zait of the IBM Almden Research Center, 1995 • “Active Data Mining”, by Rakesh Agrawal and Giuseppe Psaila of the IBM Almden Research Center, 1995 • “The Quest Data Mining System”, by Rakesh Agrawal, Manish Mehta, John Shafer, and Ramakrishnan Srikant of the IBM Almden Research Center in coordination with Andreas Arning and Toni Bollinger of the IBM German Software Laboratory, 1996 • IBM Almden Research Center Website: http://www.almaden.ibm.com/software/quest/

More Related