280 likes | 288 Views
Masking Failures from Application Performance in Data Center Networks with Shareable Backup. Dingming Wu + , Yiting Xia +* , Xiaoye Steven Sun + , Xin Sunny Huang + , Simbarashe Dzinamarira + , T. S. Eugene Ng + + Rice University, * Facebook, Inc. Data Center Network Should be Reliable.
E N D
Masking Failures from Application Performance in Data Center Networks with Shareable Backup DingmingWu+,Yiting Xia+*, XiaoyeStevenSun+, XinSunny Huang+,SimbarasheDzinamarira+, T. S. Eugene Ng+ +Rice University, *Facebook, Inc.
NetworkFailuresareDisruptive • Median case of failures: 10% less traffic delivered • Worst 20% of failures: 40% less traffic delivered Gill et al. SIGCOMM 2011
Today’sFailureHandling---Rerouting • Fast local rerouting inflatedpathlength • Global optimal rerouting highlatencyofroutesupdates • Impact flows not traveling trough the failure location
Impact on Coflow Completion Time (CCT) • Facebookcoflowtrace • k=16Fat-treenetwork • Globaloptimal rerouting
DoWeHaveOther Options? • Restores network capacity immediately after failure • Be cost efficient • --Small pool of backup switch • How do we achieve that?
Circuit Switches • Physicallayerdevice • Circuitcontrolledbysoftware C A • Examples • --optical 2D-MEMS switch, 40us, $10 per-port cost • --electrical cross-point switch, 70ns, $3 per-port cost B D
IdealArchitecture Circuit Switch … … … BackupSwitch Servers Regularswitches • Entirenetworksharesonebackupswitch • Unreasonablehighport-countofcircuitswitch • Replaceanyfailedswitchwhennecessary • Singlepointoffailure
How to Make It Practical • Feasibility • -small port-count circuit switches • Scalability • -partition network into failure groups • -distribute circuit switches across the network • Low cost • -small backup pool • -share backup switches per failure groups
ShareBackupArchitecture AnoriginalFat-treewith k=6 • Partitiontheswitchesintofailuregroups;eachwithk/2switches. Corelayer • Addbackupswitchesperfailuregroups Agg.layer Edgelayer
EdgeLayer Edge switches Backup Switch 0 1 2 Circuit switches 1 0 2 0 2 1 Servers i
AggregationLayer Backup switch Agg. switches 0 1 2 ? 1 0 2 1 2 Circuit switches 0 1 2 0 1 2 0 ? Edge switches Backup switch 0 1 2
Core Layer Core switches 0 3 6 1 4 7 2 5 8 Circuit switches Aggregation switches Backup switch 0 1 2 0 1 2 0 1 2
Recover First, Diagnose Later • FailureRecovery • --switchfailurereplacedbybackupsviacircuitreconfiguration • --linkfailureswitchesonbothsidearereplaced • Automatic failure diagnosis performed offline • -details in the paper
Live Impersonation of Failed Switch Backup switch Edge switches 0 1 2 Routing Table of Every Edge Switch Routing Table 0 VLAN 0 Routing Table 1 VLAN 1 Routing Table 2 VLAN 2 Servers
Live Impersonation of Failed Switch Backup switch Edge switches 0 1 2 Routing Table of Every Edge Switch Routing Table 0 VLAN 0 Routing Table 1 VLAN 1 Routing Table 2 VLAN 2 0 Servers
Live Impersonation of Failed Switch Edge switches Backup switch 0 1 2 Routing Table of Every Edge Switch Routing Table 0 VLAN 0 Routing Table 1 VLAN 1 Routing Table 2 VLAN 2 0 Servers
What does control system do? • Collects keep-alive messages & link status reports from switches • Reconfigures circuit switches under failures • Performs offline failure diagnosis • Implications • -needs to talk to many circuit switches and packet switches • -keeps a large amount of states of circuit/switch/link status
DistributedControl System • Onecontrollerforafailuregroupofk/2switches --configuresthecircuitswitchesadjacent toswitchesinthegroup • Maintainsonlylocalcircuitconfigurationsinitsgroup • --doesnotsharestateswithothercontrollers • Talkstocircuitswitchesusinganout-of-bandcontrolnetwork
Summary • FastFailureRecovery • --asfastastheunderlyingcircuitswitchingtechnology • LiveImpersonation • --Traffic is redirected to the backups in physical layer • --Switchesinafailuregrouphavesameroutingtables,useVLANidfordifferentiation • --Regular switches recovered from failures become backup switchesthemselves Fastfailurerecovery,nopathdilation,noroutingdisturbance
Evaluation • Bandwidth Advantage • --Iperf throughput on testbed • Application performance • --MapReduce job completion time
Bandwidth Advantage • 4racks,8 servers,12switches • 8 iPerf flows saturate the network core ShareBackup restores network to full capacity regardlessoffailurelocations
Application Performance 1.2X MapReduce Sort w/ 100GB input data 4.2X ShareBackup preservesapplicationperformanceunderfailures!
ExtraCost • Smallport-countcircuitswitches---veryinexpensive • --e.g.$3per-portcostforcross-pointswitches • Smallbackupswitchpool • --1backupperfailuregroupisusuallyenough • --k = 48 fat-tree with 27648 servers ~6.7%extranetworkcost • Partialdeployment • --failuresmoredestructiveatedgelayer • --employbackuponlyforToRfailures
Conclusion • ShareBackup:anarchitecturalsolutionforfailurerecoveryinDCNs • --usescircuitswitchingforfastfailover • --is aneconomicalapproachofusingbackupsinnetworks • --preservesapplicationperformanceunderfailures • Keytakeaways: • --reroutingisnotthe only approach forfailurerecovery • --fast,transparentfailurerecoveryispossiblethroughcarefulbackupplacements&fastcircuitswitching
Backup---ControlSystemFailures • Circuitswitchsoftwarefailure/controlchannelfailure • --circuitswitchesbecomeunresponsive • --keepexistingcircuitconfigurations,dataplaneisnotimpacted • --fallbacktorerouting • Hardware/powerfailure • --controllerwillreceivelotsfailurereportsinashorttime • --callforhumanintervention • Controllerfailure • --state replication on shadow controllers
Backup---Offline Failure Diagnosis 0 0 0 Aggregation switch ? ? • Recycle healthy switch - Only one switch has failed - Back to normal after reboot • Chain up circuit switches using side ports Circuit switches 0 0 0 ? ? Edge switches 17
Backup---Offline Failure Diagnosis 0 0 0 Aggregation switch Circuit switches 0 0 0 Edge switches 18