1 / 27

“Artificial Intelligence” in my research

“Artificial Intelligence” in my research. Seung-won Hwang Department of CSE POSTECH. Recap. Bridging the gap between under-/over-specified user queries

garvey
Download Presentation

“Artificial Intelligence” in my research

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. “Artificial Intelligence” in my research Seung-won HwangDepartment of CSEPOSTECH

  2. Recap • Bridging the gap between under-/over-specified user queries • We went through various techniques to support intelligent querying, implicitly/automatically from data, prior users, specific user, and domain knowledge • My research shares the same goal, with some AI techniques applied (e.g., search, machine learning)

  3. The Context: query top-3 houses select * from housesorder by [ranking function F]limit 3 Rank Formulation Rank Processing ranked results e.g., realtor.com

  4. Overview Usability:Rank Formulation query top-3 houses select * from housesorder by [ranking function F]limit 3 Rank Formulation Rank Processing Efficiency:Processing Algorithms ranked results e.g., realtor.com

  5. Part I: Rank Processing • Essentially a search problem (you studied in AI)

  6. Limitation of Naïve approach Merge step Sort step new (search predicate) : x F = min(new,cheap,large) k = 1 a:0.90, b:0.80, c:0.70, d:0.60, e:0.50 cheap (expensive predicate) : pc û û û Algorithm b:0.78 d:0.90, a:0.85, b:0.78, c:0.75, e:0.70 large (expensive predicate) : pl û û û b:0.90, d:0.90, e:0.80, a:0.75, c:0.20 • Our goal is to schedule the order of probes to minimize the number of probes

  7. a b c a:0.9 a:0.85 b:0.8 b:0.8 c:0.7 c:0.7 d:0.6 d:0.6 e:0.5 e:0.5 global schedule : H(pc, pl) 0.85 0.75 0.75 0.78 0.90 0.78 Unnecessary probes initial state pr(a,pc) =0.85 pr(a,pl) =0.75 e d a b c e d b b goal state

  8. Search Strategies? • Depth-first • Breadth-first • Depth-limited / iterative deepening (try every depth limit) • Bidirectional • Iterative improvement (greedy/hill climbing)

  9. Best First Search • Determining which node to explore next, using evaluation function • Evaluation function: • exploring more on object with the highest “upper bound score” • We could show that this evaluation function minimizes the number of evaluation, by evaluating only when “absolutely necessary”.

  10. Necessary Probes? • Necessary probes • probe pr(u,p) is necessary if we cannot determine top-k answers until probing pr(u,p), where u: object, p: predicate Let global schedule be H(pc, pl) top-1: b(0.78) 0.85 0.75 0.75 ≤0.90 Can we decide top-1 without probing pr(a,pc)? 0.78 0.90 0.78  No pr(a,pc) necessary! 0.75 0.20 0.20 0.90 0.90 0.60 0.70 0.80 0.50

  11. a:0.9 a:0.85 b:0.8 b:0.78 b:0.78 b:0.8 b:0.8 a:0.75 a:0.75 a:0.75 c:0.7 c:0.7 c:0.7 c:0.7 c:0.7 d:0.6 d:0.6 d:0.6 d:0.6 d:0.6 e:0.5 e:0.5 e:0.5 e:0.5 e:0.5 global schedule : H(pc, pl) 0.85 0.75 0.75 0.78 0.90 0.78 Unnecessary probes pr(a,pc) =0.85 pr(a,pl) =0.75 pr(b,pc) =0.78 pr(b,pl) =0.90 Top-1 b:0.78

  12. Generalization Random Access Sorted Access r = ¥ (impossible) r =1 (cheap) r = h (expensive) Unified Top-k Optimization [ICDE05a/TKDE] NRA, StreamCombine FA, TA, QuickCombine CA, SR-Combine s =1 (cheap) FA, TA, QuickCombine NRA, StreamCombine s = h (expensive) MPro [SIGMOD02/TODS] s = ¥ (impossible)

  13. Just for Laugh: Adapted from Hyountaek Yong’s presentation  Strong nuclear force Electromagnetic force Weak nuclear force Gravitational force Unified field theory

  14. FA TA NRA CA MPro Unified Cost-based Approach

  15. Generality • Across a wide range of scenarios • One algorithm for all

  16. Adaptivity • Optimal at specific runtime scenario

  17. Cost based Approach • Cost-based optimization • Finding optimal algorithmfor the given scenario, with minimum cost, from a space  •  • Mopt

  18. Evaluation: Unification and Contrast (v. TA) Unification: For symmetric function, e.g., avg(p1, p2), framework NC behaves similarly to TA Contrast: For asymmetric function, e.g., min(p1, p2), NC adapts with different behaviors and outperforms TA cost cost N T N depth intop2 depth intop2 T N depth intop1 depth intop1

  19. Part II: Rank Formulation Usability:Rank Formulation query top-3 houses select * from housesorder by [ranking function F]limit 3 Rank Formulation Rank Processing Efficiency:Processing Algorithms ranked results e.g., realtor.com

  20. Learning F from implicit user interactions Using machine learning technique (that you will learn soon!) to combinequantitative model for efficiency and qualitative model for usability • Quantitative model • Query condition is represented as a mapping F of objects into absolute numerical scores • DB-friendly, by attaining the absolute score on each object • Example F( )=0.9 F( )=0.5 • Qualitative model • Query condition is represented as a relative ordering of objects • User-friendly by alleviating user from specifying the absolute score on each object • Example >

  21. A Solution: RankFP (RANK Formulation and Processing) For usability, a qualitative formulation front-endwhich enables rank formulation by ordering samples For efficiency, a quantitative ranking function F which can be efficiently processed yes Over S: RF» R*? ranking R* over S Q: select * from housesorder by Flimit k ranking function no Function Learning: learn newF 5 4 3 F 2 1 ranked results processing of Q Sample Selection: generate new S sample S (unordered) Rank Processing Rank Formulation

  22. Challenge: Unlike a conventional learning problem of classifying objects into groups, we learn a desired ordering of all objects Solution:We transform ranking into a classification on pairwise comparisons [Herbrich00] learning algorithms: a binary classifier + - F Task 1: RankingClassification classification view: ranking view: c > b > d > e > a pairwise comparison classification c a-b - b - b-c d + c-d e a + d-e - a-c … … [Herbrich00] R. Herbrich, et. al. Large margin rank boundary for ordinal regression. MIT Press, 2000.

  23. Challenge: With the pairwiseclassification function, we need to efficiently process ranking. Solution:developing duality connecting F also as a global per-object ranking function. Task 2: ClassificationRanking F(a-b)? F(a)=0.7 F(a-c)? F(a-d)?….. • Suppose function F is linearClassification View:Ranking View:F(ui-uj)>0  F(ui)- F(uj)>0  F(ui)> F(uj) b a • Rank with F(.)e.g., F(c)>F(b)>F(d)>… c e d

  24. Task 3: Active Learning • Finding samples maximizing learning effectiveness • Selective sampling: resolving the ambiguity • Top sampling: focusing on top results • Achieving >90% accuracy in <=3 iterations (<=10 ms) F F

  25. Using Categorization for Intelligent Retrieval • Category structure created a-priori (typically a manual process) • At search time: each search result placed under pre-assigned category • Susceptible to skew  information overload

  26. Categorization: Cost-based Optimization • Categorize results automatically/dynamically • Generate labeled, hierarchical category structure dynamically based on the contents of the tuples in the result set • Does not suffer from problems as in a-priori categorization • Contributions: • Exploration/cost models to quantify information overload faced by an user during an exploration • Cost-driven search to find low cost categorizations • Experiments to evaluate models/algorithms

  27. Thank You!

More Related