1 / 27

Autocompletion for Mashups

Autocompletion for Mashups. Ohad Greenshpan, Tova Milo, Neoklis Polyzotis. Tel-Aviv University UCSC. Talk Roadmap. Introduction on Mashups and Autocompletion Problem Definition The Algorithm Implementation & experiments Conclusions & Related Work.

roch
Download Presentation

Autocompletion for Mashups

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Autocompletion for Mashups Ohad Greenshpan, Tova Milo, Neoklis Polyzotis Tel-Aviv University UCSC

  2. Talk Roadmap • Introduction on Mashups and Autocompletion • Problem Definition • The Algorithm • Implementation & experiments • Conclusions & Related Work

  3. Introduction - What is a mashup ? Mashupis a technology for integration of data, services and applications being available on the web, into a single application.

  4. Mashup Platform GUI GUI GUI GUI GUI GUI GUI GUI Logic Logic Logic Logic Logic Logic Logic Logic Data Data Data Data Application Integration GUI Logic Data Data Data Data

  5. Components Repository Choose some relevant components Decide which should be connected and learn their spec Glue 10 2 Mashup Repositories Mashup Development is difficult ...

  6. knowledge ? knowledge

  7. Introduction - Mashup Autocompletion

  8. Glue Pattern API Data & logic Mashlets & Mashlet-APIs Mashlet Mashlet API API Data & logic Data & logic Mashlets & Mashlet-APIs Mashlets & Mashlet-APIs The Mashup Model

  9. Inheritance B B A A

  10. Mashup Autocompletion – Problem Definition Given a database of mashlets and GPs and a set of mashlets selected by the user, identify and rank GPs that link a subset of the selected mashlets. Based on: Popularity & Relevance to user query What would be the “ideal” GP: The mostpopular one that connects only the user mashlets and nothing else Relaxations: • Less popular • Connects variants of the user mashlets • Connects a subset of the user mashlets • Connects additional mashlets

  11. Inheritance

  12. 0 0.4 0.3 0.2 0 1 0 0 0 0 0 0 0 0 0 0 . . . 0 0 1 0 1 0 . . . g A simplified 3D illustration Problem Abstraction • Each glue pattern is represented as a point in a multidimensional space. • One dimension representing the GP popularity • The rest: All mashlets • 1) User Mashlets • 2) Other mashlets • The algorithm goal is to find the top-k GPs that link the given user mashlets (the ones close to the optimal GP). GP Popularity m2 m1

  13. Data Structure & Basic Top-k Algorithm GP Popularity Mashlets Glue Patterns

  14. Problems with the algorithm • The number of lists the algorithm accesses is very large • Most of the mashlet lists are unrelated to the user selection (query)

  15. Data Structure Mashlets GP Popularity User mashlets Glue Patterns

  16. n n n M and pg’[m]=0 for n < m ≤ |Mall| Algorithm

  17. Correctness of AC* - Lemma • Theorem 4.1:Algorithm AC* returns a correct solution Proof is based on a lemma showing that any candidate that has not been encountered by AC*, has a total score lower than the threshold. Optimality of AC* • Competing Algorithms: • C – class of deterministic algorithms that operate under the same access model as AC*. • Algorithms receive as input the lists, the monotonic function, and k. • Algorithms can use any order (i.e., not specifically round-robin) and any thresholding scheme, and can rely on accessed elements. • Instance Optimality: • AC* is instance optimal within class C if there are constants c and c0 such that for every input instance I, cost(AC*,I) ≤c·cost(A,I)+c0for any AC.

  18. Calculating Popularity Glue Pattern and Mashlets Rank • Page-rank style algorithm • Takes into account popularity of mashlets and GPs, as well as relationship between them. GP GP GP M M GP M M

  19. 1 1 2 3 5 4 IBM Mashup Center Implementation Websphere Application Server Knowledge base MatchUp Algorithm

  20. Experiments(synthetic dataset) Synthetic dataset for large-scale experiments • Generated a DB of 40k mashlets & GPs (ProgrammableWeb has 4k) • Based on ProgrammableWeb characteristics. Experiments for synthetic dataset • Varying # of total mashlets and GPs • Varying k • Varying # of user mashlets • Varying GP complexity

  21. Results(synthetic dataset) GP Complexity = 5, varying k

  22. Results (synthetic dataset) GP Complexity = 10, varying k

  23. Results (synthetic dataset) Varying # of user mashlets

  24. Experiments (real dataset) Real dataset • Used real-life mashlets from ProgrammableWeb and IBM Mashup Center • Scenario: development of a travel-related mashup Experiments for quality assesment • IBM Mashup Center as the mashup platform • Users placed mashlets • MatchUp offered top-10 GPs for their mashlets • Users searched for alternatives Results • User satisfaction was high • High correlation between suggestions and users’ lists • Browsing for additional results was in general unsuccessful • Gluing process was significantly expedited

  25. Related Work • Autocompletion in many other domains • Phrase Prediction (Nandi & Jagadish, VLDB 2007) • File locations (Myers, CHI 2000) • Web service composition • Model for WS composition (Berardi et al., VLDB 2005) • Optimized and customized algorithm (Mcilraith and Son, KR 2002) • Mashup assembly tools • MashMaker (Ennals & Garofalakis, SIGMOD 2007) : data -> widgets • MashupAdvisor (Elmeleegy et al., ICWS 2008): mashup -> output recomm. -> assembly to achieve this output

  26. Conclusions • A novel Autocompletion mechanism for rapid development of mashups • Using the collective wisdom of other users on the web • A dedicated Threshold-based top-k algorithm which reduces the search space • Pagerank-style calculation of mashlets and glue patterns popularity Future Work • Infer semantic inheritance automatically • Distributed environment • Incorporating context and user preference

More Related