1 / 28

Adding Structure to Top-K: Form Items to Expansions

Adding Structure to Top-K: Form Items to Expansions. Date : 2012.5.21 Source : CIKM’ 11 Speaker : I- Chih Chiu Advisor : Dr. Jia -Ling Koh. Index. Introduction Problem Definition Basic Algorithm Semantic Optimization Experiments Conclusion. Introduction.

kory
Download Presentation

Adding Structure to Top-K: Form Items to Expansions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Adding Structure to Top-K: Form Items to Expansions Date : 2012.5.21 Source : CIKM’ 11 Speaker : I-Chih Chiu Advisor : Dr. Jia-Ling Koh

  2. Index • Introduction • Problem Definition • Basic Algorithm • Semantic Optimization • Experiments • Conclusion

  3. Introduction • Keyword based search interfaces are extremely popular.

  4. Introduction • Google search • Query → What’s the weather today? • Results include ‘what’, ’weather’, ’today’. • Lack of semantic. • Del.icio.us • Search results → Using a faceted interface. • Expansions → A fixed set of tags.

  5. Introduction • Motivated by these drawbacks of current search result interfaces, considering a search scenario in which each item is annotated with a set of keywords. • Don’t need to assume the existence of pre-defined categorical hierarchy • Want to automatically group query result items into different expansions of the query corresponding to subsets of keywords.

  6. Index • Introduction • Problem Definition • Basic Algorithm • Semantic Optimization • Experiments • Conclusion

  7. Problem Definition • A set of items S = {t1, ..., tn} • A set of m attributes {a1, ..., am} • The overall utility of an item ti, • Given a query Q ti.aj : normalized to [0,1]

  8. Problem Definition • Group items into different expansions of Q and return high quality expansions. • A subset of keywords e ⊆ K − Q.(K : all keywords) • Subset-of relationship for K-Q={k1,k2,k3,k4}

  9. Determining Importance of An Expansion • Definition : Top-k Expansions. • Given a set S of items and a keyword query Q, find the top-k expansion set Ek = {e1, ..., ek} s.t. ∀e ∈ Ek and ∀e′ ∈ EQ − Ek , u(e) ≥ u(e′). • Only consider top-N matching items • VSe= {u(t) | t ∈ Se} • If Se1⊆ Se2, then g(Se1) ≤ g(Se2).

  10. Index • Introduction • Problem Definition • Basic Algorithm • Semantic Optimization • Experiments • Conclusion

  11. Naïve Algorithm Round-robin • TopExp-Naïvealgorithm

  12. Improved Algorithm • Drawback of the naïve algorithm • 2|Kw(t)−Q| possible expansions • Leverage the lattice structure of expansions to avoid enumerating and maintaining unnecessary expansions. • ∀k ∈ K<t, k et , we just need to maintain one single expansion et. L LK

  13. Improved Algorithm • Avoiding Unnecessary Expansions • If ∀e ∈ L , e ∩ et ∅ • If ∃e ∈ L ,e ∩ et ∅ • e et • ete • e etor ete

  14. Improved Algorithm • TopExp-Lazy algorithm

  15. Improved Algorithm • To count how many expansions correspond to the same set of items. • Use the classical inclusion-exclusion principle. • 2|e| − count − 1 • count += 2|e’|-1 • E.g. e = {k1,k2,k3} → 8 (2|e|) e’ = {k1,k2},{k3} → 4 (count) 8 – 4 – 1 = 3 • ({k1, k2, k3}, {k1, k3} and {k2, k3}).

  16. Index • Introduction • Problem Definition • Basic Algorithm • Semantic Optimization • Experiments • Conclusion

  17. Weighting Expansions • Small size (e.g., “XML”) → “general topics” • Large size (e.g., “XML, schema, conformance, automata”) → “specific topics” • Expansions are neither too large nor too small. • consider the Gaussian function • {K1,k2} → u(e)× fw(2) and (e)× fw(2) • {k1},{k2} → u(e)× fw(1) and (e)× fw(1)

  18. Path Exclusion based Algorithm • Definition (Maximum k Path-Exclusive Expansion) • Given a set S of items and a keyword query Q, find the top k-expansion set Ek= {e1, ..., ek} s.t. ∀ei, ej∈ Ek , i j, eiej, ejei, and is maximized. • The maximum k path-exclusive expansion problem is NP-hard by a direct reduction from the maximum weighted independent setproblem.

  19. Path Exclusion based Algorithm • approximation • , • w(S) is the sum of weights of a set of nodes S • NG(v) is the set of neighbors of v in G. G Assume weights are equal 1. H1 H2

  20. Path Exclusion based Algorithm • Top-PEkExp algorithm Assume

  21. Index • Introduction • Problem Definition • Basic Algorithm • Semantic Optimization • Experiments • Conclusion

  22. Experiments • Synthetic datasets • Generated 5 synthetic datasets with size from 8000 to 12000. • Efficiency • Scalability • Memory saving • Real datasets • The ACM Digital Library. • Demonstrate the quality of the expansions returned.

  23. Experiments • Fixed N=10 and k=10

  24. Experiments • Fixed number of items=10000, N = 10

  25. Experiments • Fixed number of items=10000, k = 10

  26. Experiments • Queries : • “xml” • “histogram” • “privacy” • Attributes : • The average author publication number • The citation count. • Keywords : • The title • Keywords list • Abstract

  27. Conclusion • They studied the problem of how to better present search/query results to users. • Proposed various efficient algorithms which can calculatetop-k expansions. • Not only demonstrated the performance of the proposed algorithms, also validated the quality of the expansions returned by doing a study on a real data set.

More Related