1 / 26

Transmission Rate Controlled TCP in Data Reservoir - Software control approach - Mary Inaba

Transmission Rate Controlled TCP in Data Reservoir - Software control approach - Mary Inaba. University of Tokyo Fujitsu Laboratories Fujitsu Computer Technologies. Data intensive scientific computation through global networks. X-ray astronomy Satellite ASUKA. Nuclear experiments.

jadzia
Download Presentation

Transmission Rate Controlled TCP in Data Reservoir - Software control approach - Mary Inaba

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Transmission Rate Controlled TCPin Data Reservoir- Software control approach -Mary Inaba University of Tokyo Fujitsu Laboratories Fujitsu Computer Technologies

  2. Data intensive scientific computation through global networks X-ray astronomy Satellite ASUKA Nuclear experiments Nobeyama Radio Observatory (VLBI) BelleExperiments Data Reservoir Very High-speed Network Digital Sky Survey Distributed Shared files Data Reservoir SUBARU Telescope Data Reservoir Local Accesses Grape6 Data analysis at University of Tokyo

  3. Research Projects with Data Reservoir

  4. Dream Computing System for real Scientists • Fast CPU, huge memory and disks, good graphics • Cluster technology, DSM technology, Graphics processors • Grid technology • Very fast remote file accesses • Global file system, data parallel file systems, Replication facilities • Transparency to local computation • No complex middleware, no or small modification to existing software • Real Scientists are not computer scientists • Computer scientists are not work forces for real scientists

  5. Objectives of Data Reservoir • Sharing Scientific Data between distant research institutes • Physics, astronomy, earth science, simulation data • Very High-speed single file transfer on Long Fat pipe Network • > 10 Gbps, > 20,000 Km (12,500 miles), > 400ms RTT • High utilization of available bandwidth • Transferred file data rate > 90% of available bandwidth • Including header overheads, initial negotiation overheads • OS and File system transparency • Storage level data sharing (high speed iSCSI protocol on stock TCP) • Fast single file transfer

  6. Basic Architecture High latency Very high bandwidth Network Data Reservoir Disk-block level Parallel and Multi-stream transfer Local file accesses Cache Disks Data Reservoir Distribute Shared Data (DSM like architecture) Local file accesses Cache Disks

  7. Data Reservoir Features • Data sharing in low-level protocol • Use of iSCSI protocol • Efficient data transfer (optimization of disk head movements) • File system transparency • Single file image • Multi-level striping for performance scalability • Local file accesses through LAN • Global disk transfer through WAN Unified by iSCSI protocol

  8. File accesses on Data Reservoir Scientific Detectors User Programs 1st level striping File Server File Server File Server File Server Disk access by iSCSI IP Switch IP Switch 2nd level striping Disk Server Disk Server Disk Server Disk Server IBM x345 (2.6GHz x 2)

  9. Scientific Detectors User Programs File Server File Server File Server File Server iSCSI Bulk Transfer IP Switch IP Switch Global Network Disk Server Disk Server Disk Server Disk Server Global Data Transfer

  10. BW behavior Data Reservoir Transfer through A file system Bandwidth(Mbps) Bandwidth(Mbps) Time (sec) Time (sec)

  11. Problems of BWC2002 experiments • Low TCP bandwidth due to packet losses • TCP congestion window size control • Very slow recovery from fast recovery phase (>20min) • Unbalance among parallel iSCSI streams • Packet scheduling by switches and routers • User and other network users have interests only to total behavior of parallel TCP streams

  12. Fast Ethernet vs. GbE • Iperf in 30 seconds • Min/Avg: Fast Ethernet > GbE FE GbE

  13. Packet Transmission Rate • Bursty behavior • Transmission in 20ms against RTT 200ms • Idle in rest 180ms Packet loss occurred

  14. Packet Spacing • Ideal Story • Transmitting packet every RTT/cwnd • 24μs interval for 500Mbps (MTU 1500B) • High load for software only • Low overhead because of limited use at slow start phase RTT RTT/cwnd

  15. Example Case of 8 IPG • Success on Fast Retransmit • Smooth Transition to Congestion Avoidance • CA takes 28 minutes to recover to 550Mbps

  16. Best Case of 1023B IPG • Like Fast Ethernet case • Proper transmission rate • Spurious Retransmit due to Reordering

  17. Performance Divergence on LFN • Parallel streams • Difference grows adversely • Slowest stream determines total performance

  18. Unbalance within parallel TCP streams • Unbalance among parallel iSCSI streams • Packet scheduling by switches and routers • Meaningless unfairness among parallel streams • User and other network users have interests only to total behavior of parallel TCP streams • Our approach • Constant Σcwnd i for fair TCP network usage to other users • Balance each cwnd i communicating between parallel TCP streams BW BW time time

  19. BW2003 US-Japan experiments • 24000 km (15,000 miles) distance (~400ms RTT) • Phoenix →Tokyo→Portland →Tokyo OC-48 x 3 OC-192 OC-192 GbE x 1 • Transfer ~1TB file • 32 servers, 128 iSCSI disks DR DR 10G Ether x 2 10G Ether GbE x 4 OC-48 x 2 Phoenix Seattle Tokyo Tokyo Chicago Portland L.A. OC-48 N.Y. OC-192 OC-192 GbE Abilene IEEAF/ WIDE IEEAF/ WIDE NTT Com, APAN, SUPER-SINET

  20. 24,000km(15,000miles) OC-48 x 3 GbE x 4 OC-192 15,680km(9,800miles) 8,320km (5,200miles) Juniper T320

  21. SC2002 BWC2002 560Mbps (200ms RTT) 95% Utilization of available bandwidth U. of Tokyo ⇔ Scinet (Maryland, USA) ⇒Data Reservoir can saturate 10Gbps network when it will be available for US-JAPAN connection

  22. Results • BWC2002 • Tokyo → Baltimore 10,800km(6,700miles) • Peak bandwidth (on network) 600 Mbps • Average file transfer bandwidth 560 Mbps • Bandwidth-distance products 6,048terabit-kilometers/second • BWC results (pre-test) • Phoenix → Tokyo → Portland → Tokyo 24,000 km (15,000 miles) • Peak bandwidth (on network) > (8 Gbps) • Average file transfer bandwidth> (7 Gbps) • Bandwidth-distance products > (168 petabit-kilometers/second) • More than 25 times improvement from BWC2002 performance (bandwidth-distance products)

  23. Bad News • Network cut-down on 11/8 • US-Japan north route connection has been completely out of order • 2~3 weeks are necessary to repair the under-sea fibers. • Planned BW = 11.2 Gbps (OC48 x 3 + GbE x 4) Actual maximum BW ≒ 8.2 Gbps (OC48 x 3 + GbE x 1)

  24. How your science benefits from high performance, high bandwidth networking • Easy and transparent access to remote scientific data • Without special programming (normal NFS style accesses) • Purely software approach with IA servers • Utilization of high-BW network for his data • 17 minutes for 1TB file transfer from the opposite location on earth • High utilization factor (> 90%) • Good for both scientists and network agencies • Scientists can concentrate on his research topics • Good for both Scientists and Computer Scientists

  25. Summary • The most distant data transfer at BWC2003 (24,000 km) • Software techniques for improving efficiency and stability • Transfer Rate Control on TCP • CWND balancing on parallel TCP • Based on stock TCP algorithm • Possibly highest bandwidth-distance productsfor file transfer between two points • Still high utilization of available bandwidth

  26. BWC 2003 Experiment is supported by NTT / VERIO

More Related