1 / 24

Communications in ISTORE

Communications in ISTORE. Dan Hettena. Communication Goals. Goals: Fault tolerance through redundancy Tolerate any single hardware failure High bandwidth with commodity hardware Use redundancy for extra bandwidth Lower latency alternative For latency-sensitive apps, such as Titanium

henrik
Download Presentation

Communications in ISTORE

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Communications in ISTORE Dan Hettena

  2. Communication Goals • Goals: • Fault tolerance through redundancy • Tolerate any single hardware failure • High bandwidth with commodity hardware • Use redundancy for extra bandwidth • Lower latency alternative • For latency-sensitive apps, such as Titanium • Provide Active Messages interface

  3. Outline • ISTORE Network Hardware • IP Communication • Actives Messages Communication

  4. ISTORE Network Hardware • Components • 64 ISTORE Bricks, each with: • Pentium II 266MHz • IBM 10kRPM SCSI disk • Can sometimes be read faster than 30MB/s • 4 100Mbps ethernet interfaces • Intel EtherExpress Pro/100 (82557/8) • Total bandwidth = 4*100Mbps = 40MB/s

  5. ISTORE Networking Hardware • Components (continued) • 14 “Little” Routing Switches • PacketEngine/Alcatel PowerRail 1000 • 20 100Mbps interfaces (copper) • 2 1Gbps interfaces (fiber) • 2 “Big” Routing Switches • PacketEngine/Alcatel PowerRail 5200 • More-than-enough 1Gbps interfaces (fiber)

  6. ISTORE Networking Hardware • Routes between bricks

  7. ISTORE Networking Hardware • Routes between bricks (continued) • Short routes • Only if connected to the same “little” switches • No need to go through a “big” switch • 2 hops

  8. ISTORE Networking Hardware • Routes between bricks (continued) • Long routes • Must choose a big switch • 4 hops

  9. ISTORE Networking Hardware • Performance observations • Switches are store-and-forward • Ethernet packets are all at least 60 bytes • 0-padded by sender if necessary • Time per 100Mbps copper hop is15ms + (10ns/bit)(size – 60 bytes)

  10. ISTORE Networking Hardware • Future work • Plug in the wires

  11. IP Communication • Goals • Stripe packets across all 4 interfaces • 4x increase in available TCP/UDP bandwidth • Automatically handle link and router faults • Transparent to TCP/UDP applications • Transparent backward-compatibility with hosts that do not support striping

  12. IP Communication • Nested outline • Previous work • Kernel driver overview • Providing fault tolerance • Providing backward compatibility • Making sure it scales

  13. IP Communication • Previous work • Linux bonding driver (net/drivers/bonding.c) • Generic driver to “bond” links • Ignores faults • Does not prevent packet reordering • Only supports one remote host • “An Architecture for Packet-Striping Protocols” (Hari, Varghese, Parulkar)

  14. IP Communication • Kernel striping driver • Cooperates with ethernet driver • Use special MAC addresses • 49:53:54:4F:<00NNNNNN>:<000000II> • Easy to determine if host supports striping • Store striping information in headers • Link status • Reordering data

  15. IP Communication • Fault tolerance • User process periodically tests links • Notifies striping driver • Striping driver will not use broken links • Need to detect performance faults, too • Backward compatibility • Same IP address for both modes

  16. IP Communication • Scales automatically, unless packets arrive out of order • Possible with multiple routes (e.g. striping) • TCP relies on a consistent round-trip time • Packet reordering confuses TCP • Result is unnecessary retransmissions • This will be an issue in ISTORE

  17. IP Communication • Scaling (continued) • Need to reorder packets before handing them up to IP • Solution (almost implemented) • Clever use of queuing algorithms on sender and receiver • Makes it unlikely that the receiver will dequeue packets out of order • This is previous work (Hari et al)

  18. IP Communication • Future Work • Complete reordering support • Automatic detection of node ID • Automatic detection of incorrect wiring • Automatic configuration of the switches • By the Diagnostic Processors

  19. Active Messages Support • Goals • Support latency-sensitive apps • Titanium, for example • This is not a primary goal for ISTORE • As reflected in the networking hardware • Non-goals • Transparency • Support for malicious users

  20. Active Messages Support • Problem: kernel ruins latency • Protocol stacks (UDP, TCP) are slow • User-kernel interaction is slow • Solution: remove kernel from critical path • Previous work: U-Net • User level ethernet driver • Cooperates with kernel driver • Only accessible to trusted users

  21. Active Messages Support • Custom AM implementation • By Dan Bonachea (not me) • Based on HPAM • But also supports big bulk transfers • Supports ISTORE user-level networking • Automatically gets a new network card • (if a link dies) • Also supports UDP

  22. Active Messages Support • Performance comparison • Using a ping program (client of AM) • Using short route (four hops) • Ethernet latency = 4*(17ms)=68ms • UDP mean round-trip time is 160ms • User-level ethernet mean is 80ms • Includes check-summing and AM overhead

  23. Active Messages Support • Future work • Compile Titanium for ISTORE • and see what happens.

  24. Conclusions • Kernel hacking is fun • And my talking is done.

More Related