Optical Network Resilience Resilience in GMPLS Networks

Place for logos of authors’ institutions Optical Network ResilienceResilience in GMPLS Networks Luca Valcarenghi, Scuola Superiore Sant'Anna, valcarenghi@sssup.it

Resilience • A network that provides some ability to recover ongoing connections disrupted by the catastrophic failure of a network component, such as a line interruption or a node failure, is said to be [F00Network] • Resilient  resilience (resiliency) • Reliable  reliability • Survivable  survivability

Outline • Part I • Dynamic resilience schemes overview • Algorithms • Comparison with static resilience • Part II • Advanced topics • Multi-layer provisioning and resilience • Differentiated resilience • Optical Ethernet resilience • Resilience in Global Grid Computing

Resilient Scheme Classification Resilience Dynamic Network spare resources are not statically reserved Static Network spare resources are computed and reserved upon connection set up Protected Connection Dynamic Provisioning Upon connection set up both primary and backup path are computed and spare resources reserved based on the chosen protection scheme for all the considered failure scenarios Restoration Backup paths are computed and spare resources are reserved upon failure occurrence for the specific failure occurrence Protection A single backup path is computed and spare resources are statically reserved for all the considered failure scenarios

Dynamic Resilient Schemes Resilience Restoration Network spare resources are found and reserved upon failure occurrence Protection Network spare resources are computed and reserved upon connection set up Shared Connection that are not contemporarily involved in the failure event share spare resources Dynamic Backup routes are computed and spare resources are reserved upon failure occurrence Pre-planned Backup routes are pre-computed but spare resources are reserved upon failure occurrence Dedicated Each connection is assigned dedicated spare resource

Dynamic Resilient Scheme QoS Parameters • Restoration Blocking Probability (Pb) • Ratio between the number recovered connections and the number of failed connections • Recovery Time (RT) • Time elapsed between failure notification and transmission restart • Restoration Blocking Probability  inter-nodal communication bandwidth and inter-nodal connectivity • Recovery Time  inter-nodal communication delay

Static vs. Dynamic Resilient Schemes • Static • Network design phase • Fixed working and spare resources • Pb • Deterministic  100% recovery from all the considered failure scenarios • Probabilistic  expected availability based on network availability data • RT • Function of the considered scheme • Dynamic • Network running phase • Dynamic working and spare resources with fixed overall network resources • Pb probabilistic and function of • considered scheme • overall network load (resource contention) • Dynamic protected connection provisioning  upon connection arrival protection • Restoration  upon failure occurence • RT function of • Dynamic protected connection provisioning  considered scheme • Restoration  overall network load (resource contention) upon failure occurrence

Static Mesh Protection Scheme Overview Protection Line Backup path for all the failed connections around the failed line Path Backup path between the connection end nodes • Dedicated • Dedicated Path Protection (DPP) • 1+1 • 1:1 • Shared • Shared Path Protection (SPP) • 1:N • Dedicated • Dedicated Line Protection (DLP) • 1+1 • 1:1 • Shared • Shared Line Protection • 1:N

Protection/Restoration Examples 2 2 2 2 3 3 3 3 1 1 1 1 4 4 4 4 5 5 5 5 0 0 0 0 DEDICATED LINK PATH SHARED

Protected Connection Dynamic Provisioning • Protection schemes • Dedicated Path Protection (DPP) • Shared Path Protection (SPP) • Necessary information distributed by means of • Routing protocol extension for traffic engineering (e.g., OSPF-TE) • Allow network nodes to have updated and synchronized view of network resources (i.e., working, protection, and available capacity) • Allow to compute working and protection paths • Signaling protocol with traffic engineering support (e.g., RSVP-TE) • Allow network node to reserve working and protection resources along chosen paths

Protected Connection Dynamic Provisioning Philosophy • Model the network through a graph • Choose an algorithm to find primary and backup path for incoming connections • Assign different weights to the graph edges for finding primary and backup path • Weights are function of network state at the connection request arrival • Main drawbacks • Information inconsistencies

Protected Connection Dynamic Provisioning Phylosophy (2) • Involved variables • G(V,E) • G graph modeling the network with V set of V vertices and E set of E edges • V={v1, v2, …, vi, …, vN} • E={e1, e2, …, ei, …, eN} • we= weight of edge e • ce= total capacity of edge e • ae=working capacity reserved on edge e for working path • be=spare capacity reserved on edge e for protection path • Objective • Minimize connection blocking probability if edges have limited ce • Minimize required spare resource if edges have unlimited ce

Dijkstra’s Algorithm • Dijkstra’s (D) algorithm requires that all the arc lengths are NONNEGATIVE (case of most data network applications) • Worst-case computational requirements less than those of BF algorithm • General idea • Based on Lemma (1) (SUBPATHS OF SHORTEST PATHS ARE SHORTEST PATHS) • Find the shortest path in order of increasing path length • The shortest of the shortest paths to vertex 1 must be the single arc path from the closest neighbor of vertex 1 • The next shortest of the shortest paths must either be the single-arc path from the closest neighbor of 1 or the shortest two-arc paths through the previously chosen vertex, and so on • To formalize the procedure in an algorithm each vertexi is labeled with an estimate Di of the shortest path length to vertex 1 • When the estimate becomes certain, the vertex is PERMANENTLY LABELED and added to the set P of permanently labeled vertices • The vertex added to P at each step will be closest to vertex 1 out of those that are not yet in P

Dijkstra’s Algorithm Example 3 2 3 1 2 1 1 1 6 4 4 4 5 1 3 2 3 2 3 1 P={1,2} 1 1 1 1 1 6 1 6 P={1, 2, 5} 4 4 5 4 5 4 1 2 3 P={1, 2, 5, 3, 4} 2 3 1 2 1 2 1 1 1 1 6 1 1 6 4 5 4 5 1 P={1, 2, 5, 3, 4, 6} 1

Protected Connection Dynamic Provisioning with Dedicated Path Protection against Single Link Failure (1) • Phylosophy • Compute working path upon connection arrival by means of Dijkstra’s algorithm • Assign infinite weight to the edges spanned by the working path • Compute secondary path by applying Dijkstra’s algorithm on the modified graph • Reserve primary and backup resources • Blocking if both primary and secondary path cannot be accommodated

Protected Connection Dynamic Provisioning with Shared Path Protection against Single Link Failure (1) • Phylosophy • Compute working path upon connection arrival by means of Dijkstra’s algorithm • Assign infinite weight to the edges spanned by the working path • Assign small weight () to the edges along which spare resources can be shared • Compute secondary path by applying Dijkstra’s algorithm on the modified graph • Reserve primary and backup resources by taking into account spare resource sharing • Blocking if both primary and secondary path cannot be accommodated

Protected Connection Dynamic Provisioning with Shared Path Protection against Single Link Failure (2) 2 3 1 4 0 5

Restoration Schemes • Advantages • Adaptable to network (traffic and topology) changes • Small spare bandwidth required (< 50%) • Drawbacks • Usually slow (recovery time > 50ms) • Coordination required upon failure

Restoration Scheme Description • Centralized Real-Time Restoration • Restoration paths are computed and spare resources reserved upon failure occurrence • Central controller with network state global knowledge • Centralized Pre-planned Restoration • Set of restoration path are pre-computed before failure occurrence while spare resources are reserved upon failure occurrence • Central controller chooses the path for the failed connections based on network state global knowledge and specific failure • Distributed Real-Time Restoration • Restoration paths are computed and spare resources reserved upon failure occurrence • Each node to which connections involved in the failure belong acts independently • Distributed Pre-planned Restoration • Set of restoration path are pre-computed before failure occurrence at each node while spare resources are reserved upon failure occurrence • Each node to which connections involved in the failure chooses the path based on his most updated network state information

Restoration Scheme Characteristics • Centralized • Simplicity of a central controller + possible optimal solution • Need for reliable controller + reliable controller communication network • Distributed • High restorability + capacity efficiency • Difficult protocol implementation + high message contention degree • Real-time • High restorability because up-to-date information • Slow recovery time + high resource contention • Preplanned • Fast recovery time • Low restorability because out-of-date information

Distributed Real-Time Restoration • Phylosophy • Upon failure notification each node finds out if outgoing connections failed • Based on OSPF-TE and RSVP-TE modify the network graph G by assigning infinite weight to the failed edges and to the fully occupied edges • Path is cpmputed by running algorithm (e.g., Dijkstra’s) on the graph G • Failed connections are rerouted • Drawbacks • Information inconsistency • Contention due to absence of coordination

Distributed Pre-planned Restoration • Set of pre-planned backup paths for each (s,d) pair • No backup resource reservation • Upon failure occurrence backup path choice based on network state information

Stochastic Preplanned Restoration (SPR) Two phase restoration schemes • Off-line phase • Preplanning of multiple restoration paths • Broadcast of network status information • On-line phase • Assignment of choice probability to restoration paths • Probabilistic choice of restoration paths • Activation of the restoration lightpaths

SPR Example 1 0 2 4 5 3 • Test network (average nodal degree 3)

SPR Scheme Characteristics • Preplanned distributed restoration schemes • Restoration  few spare wavelengths required • Multiple available restoration paths  high restorability (low blocking probability) • Preplanned restoration paths + probabilistic choice of restoration lightpaths  • Low contention • No node coordination upon failure • Fast recovery time • Adaptable to network changes • Compatible with GMPLS control plane

Node Data Base • Preplanned paths pis,d • total link capacity • utilized working link capacity • potential number of wavelenght utilized along one link upon failure of another link

OSPF-TE and E-CR-LDP i i

Dynamic Provisioning vs. Restoration • Dynamic Provisioning of protected connections • Able to guarantee 100% recovery from all the considered failure scenarios • Sub-optimal because function of connection arrival order  outdated information • Restoration • Optimal spare capacity utilization  ideally the same than static spare path protection because not function of connection arrival order • Unable to guarantee 100% recovery to all failed connections

Failure Recovery Phases • Failure detection time • Network failure is detected • Time elapsed between failure occurrence (tf) and failure detection (td) • Td= td- tf • Failure notification time • Failure occurrence is notified to the nodes in charge of failure recovery • Time elapsed between failure detection (td) and failure notification (tn) • Tn= tn-td • Failure recovery time • Failure is recovered and data transmission restart • Time elapsed between failure notification (tn) and transmission restart (tr) • Tr= tr- tn Td Tn Tr tf td tr t tn

Flooding-based Failure Notification • Based on OSPF Link State Updates (LSU) • LSU exchange based on flooding • Origin node sends packet to its neighbors • Neighbors relay packet to their neighbors and so on • Stopping conditions • A node does not relay a packet back the node from which it received it • A node transmits the packet to its neighbors at most once • In the packet origin node ID number and sequence number (incremented with each new packet issued by the origin node) are inserted • Each node stores the highest sequence number received for each origin node • Nodes do not relay packets with sequence numbers that are less than or equal to the one stored SN1 2 3 SN1 SN1 SN1 SN1 SN1 1 4 SN1 SN1 0 5 SN1

Signaling-based Failure Notification • Just the end-nodes of the connections disrupted by the failure are notified • Failure notification along the connection path • E.g., RSVP NOTIFY message [RFC3473][RFC2205] NTF 2 3 NTF 1 4 0 5

GMPLS Enabled Intelligent Optical Network Architecture • IP over WDM • IP/GMPLS control plane • Routing  IGP (e.g., OSPF, OSPF-TE) • Signaling  GMPLS (e.g., CR-LDP, RSVP-TE) • WDM data plane • Highly dynamic IP traffic characteristics  high flexibility in WDM

Resilient Schemes in IP over WDM Networks • In the two-layer IP over WDM architecture each layer can provide its own independent resilient scheme • Restoration schemes are commonly available at higher layers (e.g., the IP layer) • Protection schemes are commonly used at the physical transport layer (e.g., WDM) • Resilience concept is naturally embedded in the IP layer • the actual path used to route a packet from source to destination is dynamically found and maintained by the routers • Protection and restoration techniques at the optical layer are just emerging driven by the need for coarse and fast resilient schemes

IP Layer Resilient Schemes • Resilient schemes available at the network layer (e.g., IP/MPLS) have the capability to recover faults and operate at fine traffic granularity • Granularity is determined by the protocol traffic unit at that particular layer • Drawbacks • generally slow • require online processing upon failure occurrence • Network layer resilient schemes • IP dynamic routing • MPLS protection switching

IP Dynamic Routing • With IP dynamic routing reachable active routers are are found dynamically, thus adapting IP routing to possible network faults • The task is accomplished by exchanging between adjacent routers control messages used to update routers’ routing tables (e.g., LSUs) • IP packet get therefore dynamically rerouted around link and node failures • IP dynamic rerouting guarantees networkwide survivability, independent of the underlying physical network

IP Dynamic Routing Fault Detection • Faults can be detected by the routers either explicitly or implicitly • Explicit fault detection • faults are detected at local level and signaled to neighboring routers through regular exchange of routing protocol control messages (ICMP) • Implicit fault detection • based on expiration of timers such as KEEPALIVE (TCP) and HELLO (IP) messages

IP Dynamic Routing Fault Recovery • Once a router detects a line fault it recalculates the affected routes and updates its routing tables • Occurred changes are propagated through UPDATE messages such as OSPF LSA and Border Gateway Protocol-4 (BGP-4)

IP Dynamic Routing Advantages and Drawbacks • Advantages • efficient use of network spare resources • flexible to topological changes • Drawbacks • usually slow (from tens of seconds to minutes) • unpredictable behavior

IP Dynamic Routing Enhancements • Equal Cost Multipath Forwarding (ECMF) • router relies on more than one path for transmitting packets sharing a common destination • in case of failure, a fraction of packets are guaranteed to flow to the destination until the router routing table is update with the recalculated routes • Partitioning the network into multiple areas as defined in hierarchical link state routing protocols • update are confined to the affected area minimizing the network reconfiguration convergence time • Increase frequency of HELLO messages or implementing rapid rate pinging through ICMP ECHO request • it permits to decrease the failure detection time

MPLS Protection Switching • MPLS protection switching is an alternative approach to circumvent the latency drawback of dynamic rerouting • MPLS protection switching is enabled through a hierarchy of Label Switched Paths (LSPs) • Protection entities can be set up either dynamically or on in a pre-negotiated way

Dynamic MPLS Protection • Protection entities dynamically set up, restore traffic based on • failure information • bandwidth allocation • optimized reroute assignment • LSP crossing a failed line or Label Switch Router (LSR) are reestablished using reservation signaling

MPLS Protection Granularity • Both MPLS protection switching schemes can be performed • on a line basis  link rerouting • only the portion of the LSPs around the failed line is rerouted • on a path basis  edge-to-edge rerouting • the entire failed LSPs are independently rerouted

MPLS Protection Switching Scheme Comparison • Dynamic protection vs. pre-established protection • increases resource utilization • requires longer restoration time • Link rerouting vs. end-to-end rerouting • faster • because in end-to-end rerouting the failure notification must reach the head-end of all the LSPs • link rerouting not well suited for handling node failure

Ethernet Advanced Features During Ethernet evolution some advanced features were introduced: • Automatic learning of MAC address • to allow plug and play • 802.1d Spanning Tree (ST) • To avoid loops and provide a slow fault tolerance • 802.1q Virtual LAN (VLAN) • To separate one physical network in many logical networks • 802.1s Multiple Spanning Tree (MST) • To allow separate spanning tree for each VLAN • 802.1p Priority Tagged Frame • To provide CoS features • 802.3ad Link aggregation • To increase bandwidth (multiple physical links joined into one logical link)

ITU and Optical Layer • International Telecommunications Union agency of United Nations devoted to standardize international communications • Optical Layer defined by ITU inside the ISO-OSI Data Link layer (Rec. G.805, G.872) • OL provides lightpaths to higher layers • Lightpath: point-to-point all-optical connection between physically non-adjacent nodes

Optical Layer (OL) Consists of: • Optical Channel (OCh) sub-layer or lightpath layer  end-to-end route of the lightpaths • Optical Multiplex Section (OMS) sub-layer  point-to-point link along the route of a lightpath • Optical Transmission Section (OTS) sub-layer  link segment between two optical amplifier stages

Optical Sub-Layers

WDM (Optical) Layer Resilient Schemes • Both OCh and OMS sublayers feature • dynamic restoration • preplanned protection • Main differences between OCh and OMS resilient schemes is represented by the granularity at which they operate • OCh schemes protect individual lightpaths • this allows selective recovery of optical line terminal (OLT) failures • OMS resilient schemes work at the aggregate signal level • all the lightpaths present on the failed line are concurrently recovered

Optical Network Resilience Resilience in GMPLS Networks

Optical Network Resilience Resilience in GMPLS Networks

Presentation Transcript

GMPLS optical networks

ResiliencE

GMPLS networks and optical network testbeds

Resilience

RESILIENCE

RESILIENCE

Resilience:

RESILIENCE

Resilience

Resilience

Optical Network Integration via GMPLS

Lecture 20 Network resilience

A21: Resilience Strategies in MMM Networks QoR: Quality of Resilience

A21: Resilience Strategies in MMM Networks QoR: Quality of Resilience

Optical Core Networks GMPLS - advanced

Optical network resilience Availability of WDM networks

Network resilience

Optical Network Resilience Resilient Network Design

Optical Network Resilience Ethernet Resilience and Resilient Network Time Domain Analysis

Resilience

Resilience