1 / 26

Predictive Parallelization: Taming Tail Latencies in Web Search

Predictive Parallelization: Taming Tail Latencies in Web Search. Myeongjae Jeon , Saehoon Kim, Seung -won Hwang , Yuxiong He, Sameh Elnikety , Alan L. Cox, Scott Rixner Microsoft Research , POSTECH , Rice University. Performance of Web Search. 1) Query response time

lizina
Download Presentation

Predictive Parallelization: Taming Tail Latencies in Web Search

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Predictive Parallelization: Taming Tail Latencies in Web Search MyeongjaeJeon, Saehoon Kim, Seung-won Hwang, Yuxiong He, SamehElnikety, Alan L. Cox, Scott Rixner Microsoft Research, POSTECH, Rice University

  2. Performance of Web Search 1) Query response time • Answer quickly to users (e.g., in 300 ms) 2) Response quality (relevance) • Provide highly relevant web pages • Improve with resources and time consumed Focus: Improving response timewithout compromising quality

  3. Background: Query Processing Stages Focus: Stage 1 Query 100s – 1000s of good matching docs doc Doc. index search For example:300 ms latency SLA 10s of the best matching docs 2nd phase ranking Few sentences for each doc Snippet generator Response

  4. Goal Query doc Doc. index search For example:300 ms latency SLA 2nd phase ranking Snippet generator Response Speeding up index search (stage 1) without compromising result quality • Improve user experience • Larger index serving • Sophisticated 2nd phase

  5. How Index Search Works Query • Partition all web pages across index servers (massively parallel) • Distribute query processing (embarrassingly parallel) • Aggregate top-k relevant pages Pages Aggregator Top-k pages Top-k pages Top-k pages Top-k pages Top-k pages Top-k pages Index server Index server Index server Index server Index server Index server Problem: A slow server makes the entire cluster slow Partition Partition Partition Partition Partition Partition All web pages

  6. Observation • Query processing on every server. Response time is determined by the slowest one. • We need to reduce its tail latencies Latency

  7. Examples Fast response Slow response Aggregator Aggregator Index servers Index servers • Terminate long query in the middle of processing • → Fast response, but quality drop Long query (outlier)

  8. Parallelism for Tail Reduction Opportunity Challenge Tails are few Tails are very long • Available idle cores • CPU-intensive workloads Latency distribution Latency breakdown for the 99%tile.

  9. Predictive Parallelism for Tail Reduction • Short queries • Many • Almost no speedup • Long queries • Few • Good speedup

  10. Predictive Parallelization Workflow Index server Execution time predictor query Predict (sequential) execution time of the query with high accuracy

  11. Predictive Parallelization Workflow Index server Execution time predictor query Resource manager long short • Using predicted time, selectively parallelize long queries

  12. Predictive Parallelization • Focus of Today’s Talk • Predictor: of long query through machine learning • Parallelization: of long query with high efficiency

  13. Brief Overview of Predictor In our workload, 4% queries with > 80 ms At least 3% must be identified (75% recall) Prediction overhead of 0.75ms or less and high precision Existing approaches: Lower accuracy and higher cost

  14. Accuracy: Predicting Early Termination Lowest Docs sorted by static rank Highest • Only some limited portion contributes to top-k relevant results • Such portion depends on keyword (or score distribution more exactly) Doc 1 Doc 2 Doc 3 ……. Doc N-2 Doc N-1 Doc N Web documents Inverted index for “SIGIR” ……. ……. Processing Not evaluated

  15. Space of Features • Term Features [Macdonald et al., SIGIR 12] • IDF, NumPostings • Score (Arithmetic, Geometric, Harmonic means, max, var, gradient) • Query features • NumTerms (before and after rewriting) • Relaxed • Language

  16. New Features: Query • Rich clues from queries in modern search engines <Fields related to query execution plan> rank=BM25F enablefresh=1 partialmatch=1 language=en location=us …. <Fields related to search keywords> SIGIR (Queensland or QLD)

  17. Space of Features • Term Features [Macdonald et al., SIGIR 12] • IDF, NumPostings • Score (Arithmetic, Geometric, Harmonic means, max, var, gradient) • Query features • NumTerms (before and after rewriting) • Relaxed • Language

  18. Space of Features • All features cached to ensure responsiveness (avoiding disk access) • Term features require 4.47GB memory footprint (for 100M terms)

  19. Feature Analysis and Selection • Accuracy gain from boosted regression tree, suggesting cheaper subset

  20. Prediction Performance • Query features are important • Using cheap features is advantageous • IDF from keyword features + query features • Much smaller overhead (90+% less) • Similarly high accuracy as using all features A = actual long queries P = predicted long queries

  21. Algorithms • Classification vs. Regression • Comparable accuracy • Flexibility • Algorithms • Linear regression • Gaussian process regression • Boosted regression tree

  22. Accuracy of Algorithms • Summary • 80% long queries (> 80 ms) identified • 0.6% short queries mispredicted • 0.55 ms for prediction time with low memory overhead

  23. Predictive Parallelism • Key idea • Parallelize only long queries • Use a threshold on predicted execution time • Evaluation • Compare Predictive to other baselines • Sequential • Fixed • Adaptive

  24. 99%tile Response Time 50% throughput increase • Outperforms “Parallelize all”

  25. Related Work • Search query parallelism • Fixed parallelization [Frachtenberg, WWWJ 09] • Adaptive parallelization using system load only [Raman et al., PLDI 11]  High overhead due to parallelizing all queries • Execution time prediction • Keyword-specific features only [Macdonald et al., SIGIR 12] → Lower accuracy and high memory overhead for our target problem

  26. Thank You! Your query to Bing is now parallelized if predicted as long. Execution time predictor query Resource manager long short

More Related