240 likes | 377 Views
This document outlines the communication goals and hardware setup of the ISTORE network. Key objectives include achieving fault tolerance through redundancy, supporting high bandwidth with commodity hardware, and utilizing redundancy for extra capacity while maintaining lower latency for sensitive applications, such as Titanium. The architecture incorporates various components like Pentium II processors, SCSI disks, and Ethernet interfaces to optimize performance. It explores packet striping, kernel-level enhancements, and future work to improve communication robustness and efficiency in the ISTORE system.
E N D
Communications in ISTORE Dan Hettena
Communication Goals • Goals: • Fault tolerance through redundancy • Tolerate any single hardware failure • High bandwidth with commodity hardware • Use redundancy for extra bandwidth • Lower latency alternative • For latency-sensitive apps, such as Titanium • Provide Active Messages interface
Outline • ISTORE Network Hardware • IP Communication • Actives Messages Communication
ISTORE Network Hardware • Components • 64 ISTORE Bricks, each with: • Pentium II 266MHz • IBM 10kRPM SCSI disk • Can sometimes be read faster than 30MB/s • 4 100Mbps ethernet interfaces • Intel EtherExpress Pro/100 (82557/8) • Total bandwidth = 4*100Mbps = 40MB/s
ISTORE Networking Hardware • Components (continued) • 14 “Little” Routing Switches • PacketEngine/Alcatel PowerRail 1000 • 20 100Mbps interfaces (copper) • 2 1Gbps interfaces (fiber) • 2 “Big” Routing Switches • PacketEngine/Alcatel PowerRail 5200 • More-than-enough 1Gbps interfaces (fiber)
ISTORE Networking Hardware • Routes between bricks
ISTORE Networking Hardware • Routes between bricks (continued) • Short routes • Only if connected to the same “little” switches • No need to go through a “big” switch • 2 hops
ISTORE Networking Hardware • Routes between bricks (continued) • Long routes • Must choose a big switch • 4 hops
ISTORE Networking Hardware • Performance observations • Switches are store-and-forward • Ethernet packets are all at least 60 bytes • 0-padded by sender if necessary • Time per 100Mbps copper hop is15ms + (10ns/bit)(size – 60 bytes)
ISTORE Networking Hardware • Future work • Plug in the wires
IP Communication • Goals • Stripe packets across all 4 interfaces • 4x increase in available TCP/UDP bandwidth • Automatically handle link and router faults • Transparent to TCP/UDP applications • Transparent backward-compatibility with hosts that do not support striping
IP Communication • Nested outline • Previous work • Kernel driver overview • Providing fault tolerance • Providing backward compatibility • Making sure it scales
IP Communication • Previous work • Linux bonding driver (net/drivers/bonding.c) • Generic driver to “bond” links • Ignores faults • Does not prevent packet reordering • Only supports one remote host • “An Architecture for Packet-Striping Protocols” (Hari, Varghese, Parulkar)
IP Communication • Kernel striping driver • Cooperates with ethernet driver • Use special MAC addresses • 49:53:54:4F:<00NNNNNN>:<000000II> • Easy to determine if host supports striping • Store striping information in headers • Link status • Reordering data
IP Communication • Fault tolerance • User process periodically tests links • Notifies striping driver • Striping driver will not use broken links • Need to detect performance faults, too • Backward compatibility • Same IP address for both modes
IP Communication • Scales automatically, unless packets arrive out of order • Possible with multiple routes (e.g. striping) • TCP relies on a consistent round-trip time • Packet reordering confuses TCP • Result is unnecessary retransmissions • This will be an issue in ISTORE
IP Communication • Scaling (continued) • Need to reorder packets before handing them up to IP • Solution (almost implemented) • Clever use of queuing algorithms on sender and receiver • Makes it unlikely that the receiver will dequeue packets out of order • This is previous work (Hari et al)
IP Communication • Future Work • Complete reordering support • Automatic detection of node ID • Automatic detection of incorrect wiring • Automatic configuration of the switches • By the Diagnostic Processors
Active Messages Support • Goals • Support latency-sensitive apps • Titanium, for example • This is not a primary goal for ISTORE • As reflected in the networking hardware • Non-goals • Transparency • Support for malicious users
Active Messages Support • Problem: kernel ruins latency • Protocol stacks (UDP, TCP) are slow • User-kernel interaction is slow • Solution: remove kernel from critical path • Previous work: U-Net • User level ethernet driver • Cooperates with kernel driver • Only accessible to trusted users
Active Messages Support • Custom AM implementation • By Dan Bonachea (not me) • Based on HPAM • But also supports big bulk transfers • Supports ISTORE user-level networking • Automatically gets a new network card • (if a link dies) • Also supports UDP
Active Messages Support • Performance comparison • Using a ping program (client of AM) • Using short route (four hops) • Ethernet latency = 4*(17ms)=68ms • UDP mean round-trip time is 160ms • User-level ethernet mean is 80ms • Includes check-summing and AM overhead
Active Messages Support • Future work • Compile Titanium for ISTORE • and see what happens.
Conclusions • Kernel hacking is fun • And my talking is done.