1 / 35

Assessment of Data Path Implementations for Download and Streaming

Assessment of Data Path Implementations for Download and Streaming. Pål Halvorsen. Overview . RELAY overview??? Existing mechanisms in Linux Tested enhancements Ongoing Summary and Conclusions. RELAY Resource Utilization in Large-Scale Time-Dependent Systems. VoD. WWW.

saxton
Download Presentation

Assessment of Data Path Implementations for Download and Streaming

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Assessment of Data Path Implementations for Download and Streaming Pål Halvorsen

  2. Overview • RELAY overview??? • Existing mechanisms in Linux • Tested enhancements • Ongoing • Summary and Conclusions

  3. RELAYResource Utilization in Large-Scale Time-Dependent Systems

  4. VoD WWW performance?? Live event Picture Today network network network network P2P

  5. Application User Space Transport Kernel Network Drivers Link Phys Hardware RELAY • System support for improvedresource utilization & QoS • Multimedia (game and video) servers • … Some current areas • protocols for interactive applications • multicast group maintenance • latency hiding • resource availability adaptation • hybrid P2P streaming / streaming to mobile devices • asymmetric multiprocessor scheduling • …

  6. Linux Data Path Implementations

  7. bus(es) Delivery Systems Network

  8. application user space kernel space file system communication system bus(es) Delivery Systems

  9. application file system communication system disk network card I/O controller hub memory controller hub file system communication system application network card disk Intel Hub Architecture • several in-memory data movements and context switches Pentium 4 Processor registers cache(s) RDRAM RDRAM RDRAM RDRAM PCI slots PCI slots PCI slots

  10. Cost of Data Transfers • Data copy operations are expensive • consume CPU, memory, hub, bus and interface resources (proportional to size) • profiling shows that ~40% of CPU time is consumed by copying data in a disk-network scenario • speed-gap between memory and CPU increase • different access times to different banks • System calls makes a lot of switches between user and kernel space • ~450 ns on 933MHz PentiumIII • ~920 ns on 1.7GHz PentiumIV

  11. Observation and Question A lot of research has been performed in this area!!!! BUT, what is the status todayof commodity OSes? IO-Lite splice MMBUF stream sendfile ….

  12. application user space kernel space file system communication system Content Download bus(es)

  13. read send Content Download: read / send application application buffer kernel copy copy page cache socket buffer DMA transfer DMA transfer • 2n copy operations • 2n system calls

  14. mmap send Content Download: mmap / send application kernel page cache socket buffer copy DMA transfer DMA transfer • n copy operations • 1 + n system calls

  15. sendfile Content Download: sendfile application kernel gather DMA transfer page cache socket buffer append descriptor DMA transfer • 0 copy operations • 1 system calls

  16. Content Download: Results • Tested transfer of 1 GB file on Linux 2.6 • Both UDP (with enhancements) and TCP UDP TCP

  17. application user space kernel space file system communication system Streaming bus(es)

  18. read send Streaming: read / send application application buffer kernel copy copy page cache socket buffer DMA transfer DMA transfer • 2n (3n) copy operations • 2n system calls

  19. writev read Streaming: read / writev application application buffer kernel copy copy copy page cache socket buffer DMA transfer DMA transfer • 3n copy operations • 2n system calls

  20. uncork mmap send send cork Streaming: mmap / send application application buffer kernel copy page cache socket buffer copy DMA transfer DMA transfer • 2n copy operations • 1 + 4n system calls

  21. writev mmap Streaming: mmap / writev application application buffer kernel copy page cache socket buffer copy DMA transfer DMA transfer • 2n copy operations • 1 + n system calls

  22. sendfile uncork send cork Streaming: sendfile application application buffer copy kernel gather DMA transfer page cache socket buffer append descriptor DMA transfer • n copy operations • 4n system calls

  23. Streaming: Results • Tested streaming of 1 GB file on Linux 2.6 • RTP over UDP Compared to not sending an RTP header over UDP, we get an increase of 29% (additional send call) More copy operations and system calls required  potential for improvements TCP sendfile (content download)

  24. Enhanced Streaming Data Paths

  25. uncork msend cork send mmap send Enhanced Streaming: mmap / msend application application buffer msend allows to send data from an mmap’ed file without copy copy kernel gather DMA transfer page cache socket buffer append descriptor copy DMA transfer DMA transfer • n copy operations • 1 + 4n system calls

  26. rtpmsend uncork msend cork send mmap Enhanced Streaming: mmap / rtpmsend application application buffer RTP header copy integrated into msend system call copy kernel gather DMA transfer page cache socket buffer append descriptor DMA transfer • n copy operations • 1 + n system calls

  27. krtpmsend rtpmsend Enhanced Streaming: mmap / krtpmsend application application buffer An RTP engine in the kernel adds RTP headers copy kernel gather DMA transfer RTP engine page cache socket buffer append descriptor DMA transfer • 0 copy operations • 1 system call

  28. rtpsendfile sendfile uncork send cork Enhanced Streaming: rtpsendfile application application buffer RTP header copy integrated into sendfile system call copy kernel gather DMA transfer page cache socket buffer append descriptor DMA transfer • n copy operations • n system calls

  29. krtpsendfile rtpsendfile Enhanced Streaming: krtpsendfile application application buffer An RTP engine in the kernel adds RTP headers copy kernel gather DMA transfer RTP engine page cache socket buffer append descriptor DMA transfer • 0 copy operations • 1 system call

  30. Enhanced Streaming: Results • Tested streaming of 1 GB file on Linux 2.6 • RTP over UDP mmap based mechanisms sendfile based mechanisms Existing mechanism (streaming) ~25% improvement ~27% improvement TCP sendfile (content download)

  31. Ongoing Work

  32. rtpsendfile Enhanced Streaming: rtpsendfile application application buffer copy kernel gather DMA transfer page cache socket buffer append descriptor DMA transfer • n copy operations • n system calls  Calls like writev, sendfilev, … exist

  33. sendfilew Enhanced Streaming: sendfilew len, off, src_fd, flags application application buffer copy kernel gather DMA transfer page cache socket buffer append descriptor DMA transfer • Batched system call enabling an arbitrary interleaving of blocks from files and user-space buffers to be sent as one or more packets

  34. Conclusions • sendfile works nice for download scenarios • Current commodity operating systems still pay a high price for streaming services • However, small changes in the system call layer might be sufficient to remove most of the overhead • Conclusively, commodity operating systems still have potential for improvement with respect to streaming support • What can we hope to be supported?

  35. Questions??

More Related