1 / 27

Evaluating Top-k Queries Over Web-Accessible Databases

Evaluating Top-k Queries Over Web-Accessible Databases. Amelie Marian Nicolas Bruno Luis Gravano. Presented By: Archana and Muhammed. Overview. To process top-k queries efficiently

lorenzo
Download Presentation

Evaluating Top-k Queries Over Web-Accessible Databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Evaluating Top-k Queries Over Web-Accessible Databases Amelie Marian Nicolas Bruno Luis Gravano Presented By: Archana and Muhammed

  2. Overview • To process top-k queries efficiently • Users specified attributes might be handled by external, autonomous sources with a variety of access interfaces • Present sequential and parallel query processing technique

  3. Introduction • Web search engine consists a list of keywords – responds with top-k pages • Do not expect exact answers • Rank of the objects that best match the queries • Scoring Function

  4. Example • Problem of finding nearest available restaurants given the address, rating and price • Rating => Zagat-Review website • Price => New York Times’s NYT-Review website • Address => MapQuest website

  5. Difference Between Multimedia Systems and Web Sources • Web sources might only support random access • Attributes access faster for centralized multimedia systems • Multimedia requires local processing – web sources can issue probes concurrently

  6. Data and Query Models • The ordering is based upon how closely the tuple matches with given query • Assignment of different weight to different attribute • Sources • S-Source: Provides list of objects in order of their scores e.g. Rating provider website Zagat-Review • R-Source: Provides score of random object e.g. Map-Quest for providing distance • SR-Source: Source that provides both kind of access • U(t) : Upper bound score for t • Uunseen : Score upper bound of any object not yet retrieved • E(t) : Expected score for t

  7. Sequential Query Processing • At most one probe (random or sorted) • This strategy returns sorted unseen objects that might not be probed by other source • Or it can return already seen object with source that needs to be probed randomly for getting the corresponding score

  8. TA Strategy-TAz algorithm • For each SR source • Algorithm retrieves the next “best” object via sorted access • Probes unknown attribute scores for this object via random access • Computes the final score for the object • At any given time keeps track of top-k tuples with their scores • Threshold value Uunseen= ScoreComb(sl(1),.1,.1) • Termination condition • K objects are found • Uunseen is no larger than scores of K top objects

  9. TAz Algorithm

  10. Improvements Over TA TAz -EP Algorithm • The assumption for bounded buffer is removed and none of the object is discarded until algorithm returns • Because same objects might be referenced again by different SR source

  11. Improvements Over TA (Contd..) • Two optimizations • Saves random access probes when object is not part of top-k answers (i.e. when its score upper bound is lower than the scores of the top-k objects) • To process selection queries of the form p1 ^ … ^ pn, where each predicate pi can be expensive to calculate • Key idea is to order the evaluation to minimize expected execution time • The order is decided by, Rank(pi) = 1-selectivity(pi)/cost-per-object(pi)

  12. Improvements Over TA (Contd..)

  13. Upper Strategy • Upper allows more flexible probes in which sorted and random accesses can be interleaved even when some objects have been partially probed • When a probe completes, the Upper decides whether- • to perform sorted-access probe on source to get new objects, or • to perform “most promising” random access probes on some objects

  14. Upper Strategy (Contd..)

  15. Upper Strategy (Contd..) Selection of further probes will again depend upon the weight for that source and our ranking function

  16. Parallel Query Processing • The sequential query processing is bound to take long processing time • Web databases exhibit high and variable latency • Attempt to maximize the source-access parallelism to minimize query processing time • Source Access Constraints • Possibility of access restrictions, variance in loads and network capabilities • The number of parallel probes for source Di can be controlled

  17. Adapting TA Strategy • pTA probes objects in parallel in order they are retrieved – respecting constraints • Each object retrieved by sorted access is placed in a queue of discovered objects • When a source Di becomes available pTA chooses which object to probe for that source by selecting the first object in queue not probed yet • Can be optimized by not probing objects whose final score cannot exceed that of the top-k objects already seen • The object is put on the “discarded” objects list

  18. pUpper Strategy • Uses SelectBestSubset function to retrieve a minimal set of sources that need to be probed for a given object – instead of a single source • These multiple probes might proceed in parallel to speed up query execution • When a random source Di becomes underutilized, object t with highest score upper bound is identified: Di Є SelectBestSubset(t)

  19. pUpper Strategy (Contd..) • pUpper associates a queue with each source for random access scheduling • Queues are regularly updated by calls to the function GenerateQueues • If source Di is available, pUpper checks Queue(Di) • If Queue(Di) is empty all random access queues are regenerated • Otherwise, probe first object of Queue(Di) • Only one sorted access request per SR-Source Di

  20. pUpper Strategy (Contd..)

  21. Local sources – Uniform, Gaussian, Zipfian, Correlated, Mixed, Cover Real Web Accessible sources Mix of SR and R sources Evaluation Setting

  22. Evaluation Results • Sequential Algorithms – Local Database

  23. Evaluation Results (Contd..) • Sequential Algorithms – Web Database

  24. Evaluation Results (Contd..) • Parallel algorithms - Local Database

  25. Evaluation Results (Contd..) • Parallel algorithms - Web Database • pUpper is faster than pTA • pUpper carefully selects the probes for each object • It considers probing time and source congestion to make probing choices per object-level • Results in better use of parallelism and faster query processing • Parallel probing significantly decreases query processing time

  26. Conclusions • Probe interleaving greatly reduces query execution time • Object level scheduling in Upper is desirable when sources exhibit moderate to high random access time • pUpper minimizes query response time while taking source access constraints • pUpper - the fastest query processing technique highlights • the importance of parallelism in a web setting • The advantages of object-level probe scheduling to adapt source congestion • The approach in this paper exploits the source access constraint of web very well • Extension of this model to capture more expressive web interfaces is possible

  27. THANK YOU

More Related