1 / 16

Efficient Mining of XML Query Patterns for Caching

Efficient Mining of XML Query Patterns for Caching. L.H. Yang, M.L. Lee, and W. Hsu Proceedings of 29th VLDB Conference, 2003. Introduction. Present an efficient algorithm, called FastXminer discover frequent XML query patterns

chelsi
Download Presentation

Efficient Mining of XML Query Patterns for Caching

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Efficient Mining of XML Query Patterns for Caching L.H. Yang, M.L. Lee, and W. Hsu Proceedings of 29th VLDB Conference, 2003 Speaker: Chao-Chen Chiu

  2. Introduction • Present an efficient algorithm, called FastXminer • discover frequent XML query patterns • only a small subset of generated candidate patterns needs tree containment tests

  3. Query Pattern Tree • Query Pattern tree • A rooted tree QPT<V, E> • Each edge e = (v1, v2) • v1 is the parent of v2 • Each vertex v has a label in {“*”, “//”, tagSet} • Rooted Subtree • A rooted subtree RST <V’, E’> • Root(RST) = Root(QPT) • V’ V, E’ E • A k-edge rooted subtree if it has k edges

  4. Tree Inclusion • Partial Ordering of Labels • Give two labels x and x’, if x = x’, then x≦x’ • For any label x tagset, define x ≦ * ≦//

  5. Tree Inclusion • A RST is contained in a QPT if the following hold: • The root nodes in RST and QPT have the same label • If a node w RST is matched with node v QPT, then it satisfies (a)w.label ≦v.label (b)each subtree of w is contained in some subtree of QPT • From XQPMiner, we know the tree containment tests is expensive!

  6. But just single-branch candidate RST Mining Query Pattern Trees find all frequent 1-edge RSTs by scaning Database once FastRstGen generate the candidate set Ck+1 by using the previously found frequent set Fk and pruning those unqualified candidates. Contains determines if RSTk+1 is contained in the pattern tree t.

  7. Candidate Generation • Schema-guided enumeration • Global query pattern tree(GQPT) • Use string to represent QPT • “1, 2, -1, 3, -1, 8”

  8. Candidate Generation • Rightmost Branch Expansion • Given a k-edge RSTk, only expand its rightmost branch • Get a set of RSTk+1s, all of them have the prefix RSTk • Two kinds of expansions from RSTk to RSTk+1 • Join of two RSTks • Rightmost leaf node expansion

  9. Candidate Generation • Join of two RSTks but don’t expand rightmost leaf node • They must have the same prefix of k nodes, that is, in the same equivalence class • Rightmost leaf node expansion • i-branch RSTk+1 (i > 1) • Join two RSTks • Single branch RSTk+1 • Need the tree containment tests

  10. Candidate Generation • Example

  11. Frequency Counting • Avoid tree inclusion test • For join part: if RSTijk+1=RSTik RSTjk then compute RSTk+1.TIDlist = RSTik.TIDlist RSTjk.TIDlist • For rmlne part: if RSTijk+1 is a muti-branch RST, then it is a join of two k-edge RST • Only single-branch RST need tree inclusion test! • Pruning Strategy If k+1-edge RST is frequent, then all its k-edge RSTs must be frequent

  12. Algorithm FastRSTGen Rightmost leaf node expansion

  13. Algorithm FastRSTGen Rightmost leaf node expansion

  14. Performance Study • P4 2.4GHz, 1GB RAM, Windows XP • Characteristics of Datasets

  15. Performance Study • Effect of Minimum Support

  16. Performance Study

More Related