network aware os
Download
Skip this Video
Download Presentation
Network-aware OS

Loading in 2 Seconds...

play fullscreen
1 / 14

Network-aware OS - PowerPoint PPT Presentation


  • 72 Views
  • Uploaded on

Network-aware OS. ESCC Miami February 5, 2003. Tom Dunigan [email protected] Matt Mathis [email protected] Brian Tierney [email protected] Roadmap. www.net100.org. Motivation Net100 project overview Web100 network probes & sensors protocol analysis and tuning Year 1 Results

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Network-aware OS' - nerice


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
network aware os

Network-aware OS

ESCC Miami

February 5, 2003

Tom Dunigan [email protected]

Matt Mathis [email protected]

Brian Tierney [email protected]

roadmap
Roadmap

www.net100.org

  • Motivation
  • Net100 project overview
    • Web100
    • network probes & sensors
    • protocol analysis and tuning
  • Year 1 Results
    • A TCP tuning daemon
    • Tuning experiments
  • Year 2
    • ongoing research, Web100 update (Mathis)
  • DOE-funded project (Office of Science)
  • $1M/yr, 3 yrs beginning 9/01
  • LBL, ORNL, PSC, NCAR
  • Net100 project objectives: (network-aware operating systems)
    • measure, understand, and improve end-to-end network/application performance
    • tune network protocols and applications (grid and bulk transfer)
    • first year emphasis: TCP bulk transfer over high delay/bandwidth nets
motivation
Motivation
  • Poor network application performance
    • High bandwidth paths, but app’s slow
    • Is it application? OS? network? … Yes
    • Often need a network “wizard”
  • Changing: bandwidths
    • 9.6 Kbs… 1.5 Mbs ..45 …100…1000…? Gbs
  • Unchanging: TCP
    • speed of light (RTT)
    • MTU (still 1500 bytes)
    • TCP congestion avoidance
  • TCP is lossy by design !
    • 2x overshoot at startup, sawtooth
    • recovery after a loss can be very slow on today’s high delay/bandwidth links
    • Non-congestive loss c*MSS/(RTT*p½)
    • Recovery proportional to MSS/RTT2

8 Mbs

Linear recovery at 0.5 Mb/s!

Instantaneous bandwidth

Early startup

losses

Average bandwidth

ORNL to NERSC ftp

GigE/OC12 80ms RTT

tcp tuning
TCP tuning
  • “enable” high speed
    • need buffer = bandwidth*RTT - autotuneORNL/NERSC (80 ms, OC12) need 6 MB
    • faster slow-start
  • avoid losses
    • modified slow-start
    • reduce bursts
    • anticipate loss (ECN,Vegas?)
    • reorder threshold
  • speed recovery
    • bigger MTU or “virtual MSS”
    • modified AIMD (0.5,1)
    • delayed ACKs, initial window, slow-start increment
  • avoid congestion collapse
  • be fair (?) … intranets, QoS

ns simulation: 500 mbs link, 80 ms RTT

Packet loss early in slow start.

Standard TCP with del ACK takes 10 minutes to recover!

net100 components for tuning
Net100 components for tuning
  • TCP protocol analysis
    • simulation/emulation
    • kernel tuning extensions
  • Web100 Linux kernel (NSF)www.web100.org
    • instrumented TCP stack (IETF MIB draft)
    • 100+ variables per flow (/proc/web100)
    • socket open/close event notification
    • API and tools for tracing and tuning, e.g., bw tester: http://firebird.ccs.ornl.gov:7123
  • Path characterization
    • Network Tuning and Analysis Framework (NTAF)
    • both active and passive measurement
      • iperf, pipechar
      • Web100 data augments probe data
    • schedule probes and distribute/archive results
    • data base of measurements
    • NTAF/Net100 hosts at PSC, NCAR,LBL,ORNL, NERSC,CERN,UT,SLAC
  • TCP tuning daemon
tcp tuning daemon
TCP Tuning Daemon

WAD config file

[bob]

src_addr: 0.0.0.0

src_port: 0

dst_addr: 10.5.128.74

dst_port: 0

mode: 1

sndbuf: 2000000

rcvbuf: 100000

wadai: 6

wadmd: 0.3

maxssth: 100

divide: 1

reorder: 9

sendstall: 0

delack: 0

floyd: 1

  • Work-around Daemon (WAD)
    • tune unknowing sender/receiver at startup and/or during flow
    • Web100 kernel extensions
      • pre-set windowscale to allow dynamic tuning
      • uses netlink to alert daemon of socket open/close (or poll)
      • besides existing Web100 buffer tuning, new tuning options using WAD_* variables
      • knobs to disable Linux 2.4 caching, burst mgt., and sendstall
    • config file with static tuning data
      • mode specifies dynamic tuning (Floyd AIMD, NTAF buffer size, concurrent streams)
    • daemon periodically polls NTAF for fresh tuning data
    • written in C (also python version)
experimental results year 1
Experimental results (year 1)
  • Evaluating the tuning daemon in the wild
    • emphasis: bulk transfers over high delay/bandwidth nets (Internet2, ESnet)
    • tests over: 10GigE,OC48, OC12, OC3, ATM/VBR, GigE,FDDI,100/10T,cable, ISDN,wireless (802.11b),dialup
    • tests over NistNET 100T testbed
  • Various TCP tuning options
    • buffer tuning
    • AIMD mods (including Floyd, both in-kernel and in WAD)
    • slow-start mods
    • parallel streams vs single tuned
  • Results are anecdotal
    • more systematic testing is on-going
    • Your mileage may vary ….

Network professionals on a closed course.

Do not attempt this at home.

wad tuning results
WAD tuning results
  • Classic buffer tuning
    • ORNL to PSC, OC12, 80ms RTT
    • network-challenged app. gets 10 Mbs
    • same app., WAD/NTAF tuned buffer gets 143 Mbs
  • Virtual MSS
    • tune TCP’s additive increase (WAD_AI)
    • add k segments per RTT during recovery
    • k=6 like GigE jumbo frame, but:
      • interrupt rate not reduced
      • doesn’t do k segments for initial window
tuning around linux 2 4 tcp
Tuning around Linux (2.4) TCP

Amsterdam-Chicago GigE via 10GigE, 100 ms RTT

  • Tunable ssthresh caching
  • Tunable “sendstall” (TXQUELEN)

600 mbs

Floyd AIMD:

as cwnd grows increase AI and

decrease MD, do the reverse

when cwnd shrinks

Added to Net100 kernel and to

WAD (WAD tunable)

Floyd AIMD

sendstalls

Standard AIMD

UDP event

wad tuning
WAD tuning
  • Modified slow-start and AI
    • ORNL to NERSC, OC12, 80 ms RTT
    • often losses in slow-start
    • WAD tuned Floyd slow-start and fixed AI (6)
  • WAD-tuned AIMD and slow-start
    • ORNL to CERN, OC12, 150ms RTT
    • parallel streams AIMD (1/(2k),k)
    • WAD-tuned single stream (0.125,4)
gridftp tuning
GridFTP tuning

Can tuned single stream compete with parallel streams?

Mostly not with “equivalence” tuning, but sometimes….

Parallel streams have slow-start advantage.

WAD can divide buffer among concurrent flows—fairer/faster?

Tests inconclusive so far….

Testing on real Internet is problematic.

Is there a “congestion metric”? Per unit of time?

Flow Mbs congestion re-xmits

untuned 28 4 30

tuned 74 5 295

parallel 52 30 401

untuned 25 7 25

tuned 67 2 420

parallel 88 17 440

Buffers: 64K I/O, 4MB TCP (untuned 64K TCP: 8 mbs, 200s)

Data/plots from Web100 tracer

ongoing planned net100 research year 2
Ongoing/Planned Net100 research (year 2)
  • analyze effectiveness/fairness of current tuning options
    • simulation
    • emulation
    • on the net (systematic tests)
  • NTAF probes -- characterizing a path to tune a flow
    • router data (passive)
    • monitoring applications with Web100
    • latest probe tools
  • additional tuning algorithms
    • Vegas
    • slow-start increment, reorder resiliance, delayed ACKs
    • non-TCP (SABUL, FOBS, TSUNAMI, ?)
    • identify non-congestive loss, ECN?
  • parallel/multipath selection/tuning
  • WAD-to-WAD tuning
  • jumbo frames experiments… the quest for bigger and bigger MTUs
  • more user-friendly, usable accelerants
  • port to Cray X1 network front-end
  • port to other OS’s
future tcp tuning
Future TCP tuning
  • Reorder threshold
    • seeing more out of order packets
    • WAD tune a bigger reorder threshold for path
      • 40x improvement!
    • Linux 2.4 does a good job already
      • adjusts and caches reorder threshold
      • “undo” congestion avoidance

LBL to ORNL (using our TCP-over-UDP) :

dup3 case had 289 retransmits, but all were unneeded!

  • Delayed ACKs
    • WAD could turn off delayed ACKs -- 2x improvement in recovery rate and slow-start
    • Linux 2.4 already turns off delayed ACKs for initial slow-start

ns simulation: 500 mbs link, 80 ms RTT

Packet loss early in slow-start.

Standard TCP with del ACK takes 10 minutes to recover!

NOTE aggressive static AIMD (Floyd pre-tune)

summary
Summary

www.net100.org

  • Novel approaches
    • non-invasive dynamic tuning of legacy applications
    • using TCP to tune TCP (Web100)
    • tuning on a per flow/path
  • Effective evaluation framework
    • protocol analysis and tuning + net/app/OS debugging
    • out-of-kernel tuning
  • Beneficial interactions
    • TCP protocols (Floyd, Wu Feng (DRS), Web100, parallel/non-TCP)
    • Path characterization research (SciDAC, CAIDA, Pinger, pathrate,SCNM)
    • Scientific application and Data grids (SciDAC, CERN)
  • Performance improvements
ad