1 / 36

Flash: An Efficient and Portable Web Server

18-845 Internet Services. Flash: An Efficient and Portable Web Server. Authors Vivek S. Pai Peter Druschel Willy Zwaenepoel Presenter Anuwat Jongpairat. Outline. Background Server Architectures Performance Characteristics Cost/Benefits of Optimizations and Features

chaim-olson
Download Presentation

Flash: An Efficient and Portable Web Server

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 18-845 Internet Services Flash: An Efficient and Portable Web Server Authors • Vivek S. Pai • Peter Druschel • Willy Zwaenepoel Presenter • Anuwat Jongpairat

  2. Outline • Background • Server Architectures • Performance Characteristics • Cost/Benefits of Optimizations and Features • Flash Implementation • Performance Evaluation • Conclusion

  3. Disk Blocking Network Blocking Background • Basic HTTP request processing steps: Accept Conn Read Request Find File Send Header Read File Send Data Start End

  4. Server Architectures • Iterative • Multi-process (MP) • Multi-thread (MT) • Single-process event-driven (SPED) • Asymmetric multi-process event-driven (AMPED)

  5. Iterative Architecture • Serve one request at a time • Very simple to implement • Inefficient • Does not interleave request processing steps

  6. Multi-process (MP) Architecture • One process per client • Disk accesses, CPU processing, network communications overlap naturally. • Cache optimization on global information is difficult. • Many requests  Context-switching

  7. Multi-thread (MT) Architecture • One thread per client • Threads sharing some address space. • Cache optimization on shared information is easy. • Synchronization of global variables • Less context-switching • Requires kernel thread support

  8. Send Header Read File Send Data Accept Conn Read Request Find File Event Dispatcher Single-process Event-Driven (SPED) Architecture • Single event-driven process performs all client processing, and disk and network activities. • Single address space  No synchronization needed & using low resource • Network I/O is non-blocking but disk reads still cause main process to be blocked.

  9. Send Header Read File Send Data Accept Conn Read Request Find File Event Dispatcher Helper 1 Helper k Asymmetric Multi-process Event-driven (AMPED)

  10. Asymmetric Multi-process Event-driven (AMPED) [2] • Multiple helper processes (or threads) • Main process performs request processing while... • Helpers perform potentially blocking (synchronous) disk I/O operations which are supported on all Unix systems.

  11. Performance Characteristics • Disk Operations • Memory Effects • Disk Utilization

  12. Disk Operations • For MP and MT, only the process or thread doing disk I/O goes to sleep while the others are still running. • SPED will be blocked on disk I/O operations • AMPED’s main process will still be running while helpers perform disk I/O operations.

  13. Memory Effects • Memory used by processes affects available cache size • MT and MP memory consumption grows as number of clients increases • SPED takes small amount of memory • AMPED’s helpers cause some overhead but use small memory.

  14. Disk Utilization • Concurrent disk requests can benefit from multiple disks and disk head scheduling. • MP, MT, AMPED can cause concurrent disk requests, taking benefit of multiple disks and disk head scheduling. • But SPED causes at most one disk request at a time.

  15. Cost and Benefits of Optimizations and Features • Information Gathering • Application-level Caching • Long-lived Connections

  16. Information Gathering • Gathering information about requests for accounting purpose or to improve performance. • MP model must use interprocess communication to combine data. • Need synchronization in MT model. • SPED and AMPED need neither IPC nor synchronization.

  17. Application-level Caching • Can use application-level caching to cache response headers of files or memory-mapped files • Per-process cache in MP architecture wastes memory. • For MT architecture, a single cache needs synchronization. • SPED and AMPED can use single cache.

  18. Long-lived Connections • Due to slow links, persistent connections (HTTP 1.1), or WAN. • May cause a large number of simultaneous connections on server and resources are occupied for an extended period. • SPED and AMPED cost file descriptors and state information in kernel. • MP and MT cost additional overhead and memory per client.

  19. Implementation: Flash • Based on AMPED architecture • Optimizations • Aggressive Caching • Pathname translation caching • Response header caching • Memory-mapped files • “Gather writes” (writev) • Dynamic Content Generation • Memory Residency Test

  20. Pathname Translation Caching • Maintains mappings between requested uri and actual path, e.g., /~beavis :: /home/users/beavis/public_html/index.html • Reduce number of calls to helper processes to do pathname translations • Less helper processes needed  Save memory

  21. Response Header Caching • Response header containing file information is sent along with file content • Cache the response headers of frequently requested files

  22. Memory-mapped File Caching • Reduces the number of unnecessary map/unmap operations on frequently requested files • Use LRU algorithm to unmap inactive “chunks” of files • mincore() is used to test memory residency

  23. Helper Helper Caching in Flash Accept Conn Read Request Find File Send Header Read File Send Data Start End Pathname Translation Cache Response Header Cache Mapped File Caching

  24. Cache Optimization Contribution • Pathname tran > mmap > response header

  25. Gather Writes • writev() system call writes data to file or socket from discontiguous buffers in one operation. • Use this to send response header and file content at once

  26. Dynamic Content Generation • Server forks CGI processes as necessary • Server allows CGI processes to be persistent reducing the cost of forking the same application for multiple requests

  27. Memory Residency Test • On most modern UNIX systems, can use mincore() to check if a mapped file pages are in memory or not • If mincore() not available, use mlock()to lock memory pages.

  28. Performance Evaluation • Compare Flash (AMPED) to • Apache 1.3.1 (MP) • Zeus 1.30 (SPED) • Flash-MP • Flash-MT • Flash-SPED

  29. Synthetic Workload • Clients request the same file repeatedly. • Servers should perform at their best since file content is always in cache. • For cached workload, the choice of architecture has little impact on performance

  30. Trace-based Experiment • More realistic workload • Traces from CS web server and from Owlnet server providing personal web pages, Rice University. • Flash achieves highest throughput

  31. Trace-based Experiments [2]

  32. ECE Trace Experiment • Evaluate server’s performance against dataset size ranging from 15 to 150 MB • Use truncated traces from ECE Dept. web server • Clients replay truncated traces as a loop to generate requests at a given dataset size

  33. ECE Trace Experiment [2] • Throughput decreases as dataset size increases • Significant drop when working set size exceeds effective memory cache size due to disk I/O operations • Flash has good performance on both cached workload and disk-bound workload

  34. ECE Trace Experiment [3] • The results confirm that SPED architecture performs well on cached workload but performs poorly on disk-bound workload. • On disk-bound work load, Flash has highest throughput since it causes less context-switching and is more memory-efficient than MP.

  35. Performance under WAN conditions • Use persistent connections to simulate long-lived connetions in WAN • MP performs poorly due to per-process overhead • Flash, SPED, and MT grows initially. • MT declines due to • Per-thread switching • Memory overhead

  36. Conclusion • Concurrent Server Architectures • SPED good on cached workload • MT and MP good on disk-bound workload • AMPED matches performance of SPED, MT, and MP on both types of workload. • Flash: AMPED implementation with aggressive caching and optimization. • Flash exceeds Zeus by 30%, Apache by 50%.

More Related