1 / 15

Summer 2002 at SLAC

Summer 2002 at SLAC. Ajay Tirumala. Main Projects. Measuring disk throughputs on remote hosts considering parameters like File System Read[write]-block size Sequential/random reads[writes] Committing sequence for writes File sizes Iperf QUICK mode

liesel
Download Presentation

Summer 2002 at SLAC

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Summer 2002 at SLAC Ajay Tirumala

  2. Main Projects • Measuring disk throughputs on remote hosts • considering parameters like • File System • Read[write]-block size • Sequential/random reads[writes] • Committing sequence for writes • File sizes • Iperf QUICK mode • A new algorithm which reduces the time for measuring end-to-end bandwidth • And thus also the network traffic generated Summer 2002 at SLAC – Ajay Tirumala

  3. Disk Throughputs • File Systems • NFS uses client’s main-memory as cache. • Data can be lost during reads/writes. So, need to perform small sized reads and commit often. • AFS uses session semantics • Local disk is the cache • UFS – default file system for Solaris • fwrites write to the disk buffer, committed to disk on fsync, buffer is full or when disk caching is disabled • EXT – most popular file system for Linux • Layer below the VFS • Has the concept of pre-allocation (allotting upto 8 adjacent file blocks when a block is requested). • Mount option available for greater write speeds (with lesser consistency). Summer 2002 at SLAC – Ajay Tirumala

  4. Disk Reads • First read will necessitate a disk-read in most cases • A memory read will indicate • minimal memory activity • a very large memory since the tests are performed with an interval of days. • Second read (performed immediately after first read) • will generally be read from memory • unless disk caching is disabled • Since there is a good probability that even the first read can be from memory, we consider disk writes as the primary metric for disk speeds. Summer 2002 at SLAC – Ajay Tirumala

  5. Disk writes • Commit modes –used fsync to commit files to disk • Plain (no commit) • Commit each write • Commit at end – Most indicative of the disk bandwidth achievable • Block sizes • For local disks use large block sizes (1-2 MB) • For remote writes, 64KB/128KB will suffice • File sizes • Using a large file size (2GB) increased the throughput in some cases. Default was 64MB. • Caution: NFS may not return error during fwrites, it may return an error only on an fsync Summer 2002 at SLAC – Ajay Tirumala

  6. Possible areas to investigate • Could consider different disk subsystems like RAID • Analysis of parallel disk-transfers using BBCP. • Initial tests have indicated that in cases where disk is the limiting factor, using single thread is the best option. • Algorithm to estimate disk speeds without using large writes*. • Manufacturers’ specs lose meaning with Network File Systems and even for local file systems with multiple disks. Summer 2002 at SLAC – Ajay Tirumala

  7. Iperf QUICK Mode • Problem • Current TCP apps cannot detect when they are out of slow-start • Bandwidth measurement apps have to run for a considerable time to counter the effects of slow-start. • Solution • Use Web100 to detect the end of slow-start • Measure bandwidth for a small period after slowstart (say 1s). • This should save about 90% of estimation time and traffic generated. Summer 2002 at SLAC – Ajay Tirumala

  8. Detecting end of Slow-start • Outline • Determine a sampling period for Congestion Window • Detect the absence of exponential increase every RTT • Handle pathological cases • Connection may not get out of slow-start • Multiple slow-starts • Connection may have a very small bandwidth-delay product. • E.g. localhost transfers, with latency in nano-seconds. • At present, it handles Reno and Vegas • It should handle Net100/Floyd stacks with minor modifications. Summer 2002 at SLAC – Ajay Tirumala

  9. The Quick mode Algorithm • Initialize Iperf sockets and initialize Web100 connection for the for the Iperf socket. • Start Web100 data collection thread • This will indicate when the connection is definitely out of slow-start • Detect the end of slow-start in the data transfer thread • If congestion window does not stabilize, do NOT report QUICK mode results • Measure bandwidth for 1s (or user specified time) after slow-start Summer 2002 at SLAC – Ajay Tirumala

  10. Salient results • Slow-starts can be • From 0.2 seconds for low-latency networks • Up to 5 sec for long haul high bandwidth networks. • Maximum gains here by using Iperf in QUICK mode. • Unless, we use it in quick mode, we can never be sure that the connection is out of slow-start • Differs with throughputs for running Iperf for 20s by less than 10% • Even performed some tests on dialup links (as receiver) with good results. Summer 2002 at SLAC – Ajay Tirumala

  11. Web100 experiences • A must use tool (I’m a fan) • User-APIs can be improved • Behaves well for a sampling time of 20ms. Summer 2002 at SLAC – Ajay Tirumala

  12. Possible areas to investigate • Integrate with BW tests. • Perform tests with slow-senders. • Empirical estimates immediately after slow-start : • Using RTT and rate of increase of congestion window. Summer 2002 at SLAC – Ajay Tirumala

  13. Links • Disk : • http://www-iepm.slac.stanford.edu/bw/disk_res.html • Iperf Quick mode : • http://www-iepm.slac.stanford.edu/bw/iperf_res.html • Documentation and results of tests with all IEPM-BW managed nodes available from these links. Summer 2002 at SLAC – Ajay Tirumala

  14. Other stuff… • Miniperf is a small Iperf-like program written to • Monitor user-specified Web100 variable(s) • Allows setting window sizes and test times • Can include parallel thread functionality • Generate graphs (rate based, sum based) • Generate HTML • Created a single Iperf version to run on IPv4/v6 (Web100)/(no Web1000). Summer 2002 at SLAC – Ajay Tirumala

  15. Thank you!!!

More Related