1 / 45

Henry Starzynski Network Operations Support Global Network Mgmt Centre Bell Canada

Henry Starzynski Network Operations Support Global Network Mgmt Centre Bell Canada. January 2014. Henry Starzynski – Manager, Global Network Management Centre Graduated from the University of Waterloo in 1982 with Bachelor of Mathematics (Computer Science)

tassos
Download Presentation

Henry Starzynski Network Operations Support Global Network Mgmt Centre Bell Canada

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Henry Starzynski Network Operations Support Global Network Mgmt Centre Bell Canada January 2014

  2. Henry Starzynski – Manager, Global Network Management Centre • Graduated from the University of Waterloo in 1982 with Bachelor of Mathematics (Computer Science) • Post graduation, worked for a computer time sharing company called Datacrown, which become Canada Systems Group, then SHL-Systemhouse • I’ve been with Bell 29.5 years! (yes there have been LOTS of changes since I started!) • Started out working on network design tools for services called Datapac and Megastream • Moved to our network management centre taking care of Datapac, managing the 7/24 console then Frame Relay (Hyperstream) support • Today, I continue with legacy network support, bring in new business for our centre, support our computers and handle international escalations • I have a life outside of Bell too! I’m involved in the local community with Scouts Canada – so, when you are free of University life, don’t forget to be involved in your community as well! You have lots of energy and knowledge that can help make local communities, where ever you end up, much better! • Don’t forget, when you leave Carleton, learning never ever stops! Keep your brains active, technology is continually changing

  3. Bell Canada’s GNMC GNMC = Global Network Management Centre One of the world’s first Data Network Management Centres Operating locally in Ottawa, serving Bell Canada customers globally

  4. Bell Canada GNMC A bit about who were are … • Involved in managing data networks in Canada since 1974, globally since 1992 • Originally - the National Data Network Control (NDNC) for domestic (Canada only) core data networks: Dataroute, Datapac (packet switching) , Megastream (Pt-Pt T1), Hyperstream (frame relay), Canadian ATM Gateway networks • Expanded to include private networks (Lotto Quebec) and VPN clouds • Started internationally with Financial Networks Associates (FNA – consortium of 8 countries ) network in 1991 (Alcatel based network) • Evolved into Global Network Management (GNMC) at the individual customer circuit level • Today, we serve as International Help Desk/SPOC (single point of contact) for international data circuit troubles • Provide proactive fault management, provisioning, change and performance management

  5. Bell Canada GNMC Main Focus Areas: • Single Point of Contact (SPOC) for international customer data circuits • VPN Managed Services (MPLS) and support of private or virtual private network clouds and routers (LAN) • Core Network Management (WAN) of legacy data networks (Datapac=Packet Switching, Frame Relay) • Technical Support and Maintenance Engineering on existing legacy networks GNMC is involved in major processes of Network Management: • Fault Management • Configuration Management (Provisioning) • Performance Management & Change Management • Security Management

  6. Network Management • Like any industry, we toss around lots of BUZZ WORDS • What do all those terms mean?? • WANs • Clouds • OSSs • Network Management • SPOC • Why do we do network management & customer management? • Why is it important? • What the heck is a network – anyway? WELL let’s start … WHAT IS A NETWORK?

  7. What is a Network ?? A Network means something different to everyone For example, a ‘network’ can be .. • LAN (Local), WAN (Wide) MAN (Metropolitan), CAN (Campus) Area networks • Point to Pointnetwork - connecting two sites regardless of distance • The ‘CLOUD’ - the service provider’s network – the infrastructure, sometimes • termed the Public Network • The `NET - the ubiquitous network • The PSTN – Public System Telephone Network • Wireless network • Home Network • A VPN – a Virtual Private Network • A ‘social’ network! • A NETWORK MEANS DIFFERENT THINGS TO DIFFERENT PEOPLE • BUT whatever your definition, all networks do the same thing!

  8. What is a Network ? • A standard definition of a ‘network’ we will use is the following: • A set of elements or NODES linked together to provide paths to transmit information, (data, voice, video) from one location to another. • A critical tool which allows businesses & people to operate and communicate • When it is all boiled down, All information is ‘data’, and it travels over a network. • Successfulnetworks aremanaged

  9. Examples of Data Networks • Transport Networks (Sonet, DS3, DS1, Fibre, IP core) – the BIG infrastructures • Circuit Switched (Public Switched Telephone Network) • Dedicated (Point to point) • Packet/Frame/Cell (legacy services) • IP (Internet/ Intranet) • Local Area Networks, in the home, office, or around the campus. • Private (TV, Radio, Financial, Lottery) or Virtual Private Networks (VPNs) • Wireless

  10. Network Characteristics • Common characteristic of all networks is • the transmission of DATA (information, etc.) • Some type of information (i.e. - data) is being transmitted from one person/computer/location to another, for business, pleasure, research, etc. • In today’s world, we take data communications over networks for granted - it is there, reliable, fault tolerant, and it NEVER fails. • We use it every day, it is part of our daily routines, part of our ‘life’! We expect connectivity!

  11. What then - is Network Management and why is it important ? • All types of networks transmit data in some form • Network management has 5 main processes: Fault Management Configuration Management (Provisioning) Accounting Management Performance Management (including Change Management) Security Management Bruce DeachmanThe Ottawa CitizenSunday, March 20, 2005 In 1994, Nicholas Negroponte, founder of MIT's Media Lab, predicted one billion people would be using the Internet by the year 2000. What he failed to point out, was that most of them would be trying to get U2 tickets. At least that's how it must have felt for countless fans who were unable to snag tickets to the Bono-led, Irish rock band's Nov. 25 Corel Centre show yesterday morning, as technology failed to keep pace with overwhelming demand, leaving old-fashioned overnight campers the happiest of all

  12. Question! What is the latest current estimate of the number of internet users in the world?

  13. http://www.internetworldstats.com/stats.htm

  14. Blasts from the Past!! ROOT CAUSES OF BLACKOUTS AND THEIR REMEDY The electric power transmission system of the United States is seriously deficient. Experts generally agree that fixing this system to an adequate level would take many years and cost of tens of billions of dollars. But the root causes of the recent “Blackout of 2003” can be solved in a relatively short time and at a much more reasonable cost. The root causes of the present problems are: • A totally outdated reliability philosophy; and • Inadequate real time monitoring of the transmission grid. Isn’t the power grid a network too? Of course! Electricity is just a form of ‘data’!

  15. http://www.speedmatters.org/blog/archive/fcc-verizon-at-fault-for-network-failures-of-2012-derecho/#.UPgdWh1lGQGhttp://www.speedmatters.org/blog/archive/fcc-verizon-at-fault-for-network-failures-of-2012-derecho/#.UPgdWh1lGQG In June 2012, large parts of the Midwest and Middle Atlantic were, without warning, hit by a destructive rain and windstorm called a derecho.  It left in its wake 22 dead, hundreds of injuries and millions of people without power or communications. Today, the FCC released a lengthy report prepared by its Public Safety and Homeland Security Bureau that looks at the communications outages that followed from the derecho, and made recommendations to avoid or reduce future failures. FCC Commissioner issued a statement reinforcing the findings and recommendations, and commenting on the service breakdowns: "Tragically, many of these were avoidable interruptions involving a lack of back-up power to central offices or failures of the service providers' monitoring systems... Carriers should test their networks and ensure that plans are in place in case of an emergency. It is time for an honest accounting of the resiliency of our nation's network infrastructure in the wireless and digital age." In computer networking: “Resiliency is the ability to provide and maintain an acceptable level of service in the face of faults and challenges to normal operation.”[1] ] Network resilience touches a very wide range of topics. In order to increase the resilience of a given communication network, the probable challenges and risks have to be identified and appropriate resilience metrics have to be defined for the service to be protected

  16. Why ‘Network Management’? From a network provider’s viewpoint … • Manage network resources equitably to ensure users can establish communications quickly & reliably • Ensure information is transferred with original quality, integrity, and securely • Operate a high performance, reliable, cost effective network that meets customer/ business/organizational needs and requirements • Plan and implement measures to prevent or mitigate interruptions of service degradation • Make $$$$$ for the network provider and its shareholders • Gain market share for the network provider • At Bell Canada, networks are the building blocks of our own business – they are why we exist!

  17. Why ‘Network Management’? From the customer’s viewpoint … • Ensure information is transferred with original quality, integrity, and securely • Obtain service at best cost/service/value combination • To ensure a customer’s business operates with minimum downtime, in order to meet the requirements of its’ customers • Meet regulatory, legal, safety requirements • For a customer, networks are critical • For businesses, for their operations. • For the general public, so we can communicate, get money, do our assignments, talk .. BE CONNECTED

  18. Network Management Poses Endless Challenges by Willie Schatz If network managers are in accord about anything, it’s that they have a lot more tasks to do than resources to handle them. The fundamental roles of a network administrator are to provide network connections for computer equipment and to ensure availability and performance of network communications. But that’s only the beginning. The administrator must set up and manage hardware and software solutions, enabling servers, clients, printers and other peripherals to communicate. He or she also is responsible for providing users the highest quality server functionality, which means uninterrupted, optimum network availability and performance. This same individual also must plan so any changes required in the network conform with changes in the larger enterprise system. “People really think network management is easier than it really is”.

  19. Network Management Processes There are five processes involved in network management Configuration Management ==Provisioning • Programming network elements to communicate with each other and user equip. • User datafill to make their service functional • Copying critical (non default) network provisioning parameters to storage in offline in databases • Ensuring billable parameters/features are updated in related billing systems • Providing ‘dumps’, downloads, or application program interfaces (APIs) to other downstream systems Why is Configuration/Provisioning management important? • Users want their service when it is ordered (on due date) • Users want to get the options they pay for • The network provider needs to ensure their service is billed

  20. Network Management Processes Fault Management==Service Assurance • Surveillance - proactive - alarms/traps from the network that indicate major problems • Isolating problems - reactive - when users have troubles • Having clearly defined escalation procedures - how to prioritize troubles • Providing customers with timely and honest status on problems - when will it be fixed? • Performing analysis on failures for trends, root cause Service Assurance is .. REAL-TIME surveillance, control , andanalysis of a network, with the objective of ensuring maximum use of network resources , particularly when it is under stress due to traffic overload or failure conditions.

  21. Network Management Processes Performance Management • Performance measures can be internal (for the provider), regulated (CRTC), or to assist the customer (how is my network performing) • Network performance (Mean time to repair, Network availability) are standard metrics used in the industry, and are often basis for ‘service level agreements’ • Customers may require information on their traffic patterns - are they paying for bandwidth they don’t require, or is their network overloaded? • Many customers want guarantees of performance – a Service Level Agreement (SLA) in order to ensure they are getting the performance they pay for. • A SLA may include the following • Network Availability • Frame/Cell/Packet delivery • Mean time to Repair • Penalty clauses for non-performance • Delay metrics

  22. Network Management Processes Change Management • Scheduling downtime / maintenance activities (new software, network upgrades) with users (notification, release or emergency) • Ensuring software levels are compatible with all network components • Keeping the customer informed of planned service interruptions is critical Networks are in need of periodic maintenance for software or hardware upgrades, etc. In a 7x24 world, unscheduled downtime can mean • loss of revenue • legal liability • threats to public safety.

  23. FROM: CHANGE MANAGEMENT PLANNED OUTAGE Foreign-Tel COMMUNICATIONS Dept.: GNMC Phone: 1-555-868-7883 Fax: 1-555-868-7822 Please respond to the following Email: tcsccip@foreigntelcommunications.com ForeignTel Communications would like to inform you that the Change Management activity will be performed as indicated below: _____________________________________________________________________ Outage #: POM041793 / POT356369 Your ref. #: Description: DISREGARD OUTAGE NOTICE//THIS IS NOT SERVICE AFFECTING//WE ARE ADDING BACKBONE CAPACITY: PORTLAND-SANTA CLARA DURING THIS PERIOD, NETWORK WILL BE IN HAZARDOUS CONDITION. WALL NOC WILL CLOSELY MONITOR THE NETWORK AND ANY ALARMS ON IT Scheduled Planned Start Date (UTC): february 16, 2014 15:00:00 Scheduled Planned End Date (UTC): february 24, 2014 03:00:00

  24. Related Network Management Activities • Co-ordination with other Carriers and Agencies. No one carrier can route traffic everywhere on the planet. Strategic alliances and co-operation amongst carriers is essential. • Dynamic Controls. Can traffic be rerouted around failures or congestion? Is this automatic or manual? • Disaster recovery planning. Could it happen to you? What would you do in the event of a ‘disaster’? • Security Who has access to the network infrastructure? Can it be ‘hacked’? Ensuring one customer’s data does not go to another customer.

  25. Security Management • The goal of security management is to control access to network resources according to local guidelines so that the network cannot be sabotaged (intentionally or unintentionally) and sensitive information cannot be accessed by those without appropriate authorization. • Security management subsystems work by partitioning network resources into authorized and unauthorized areas. • They identify sensitive network resources (including systems, files, and other entities) and determine mappings between sensitive network resources and user sets. • They also monitor access points to sensitive network resources and log inappropriate access to sensitive network resources.

  26. AT&T Customer Info HackedBy TSC Staff8/29/2006 9:05 PM EDT AT&T late Tuesday said that hackers broke into a computer system and accessed personal data, including credit card information, from thousands of customers who had purchased DSL equipment from the company's Web store. Kaspersky says Web hack 'should not have happened' 02/09/2009 It's the worst thing that can happen to a computer security vendor: This weekend, Moscow's Kaspersky Lab was hacked. A hacker, who identified himself only as Unu, said that he was able to break into a section of the company's brand-new U.S. support Web site by taking advantage of a flaw in the site's programming. http://www.csoonline.com/article/706400/10-hacks-that-made-headlines

  27. Network Management Centre Functions • 7 x 24 operation - it’s more than a buzzword. • Operations Support Systems for provisioning, change management, surveillance, trouble tracking, customer records • Subject experts/access to engineering support personnel or labs • Multiple & diverse communications channels • Situation (War) room • Secure and Independent Power Supply • Access to Information Databases • Contact information for support resources (level 1,2 3 support, vendor support) • Secure location • Fully redundant backup location

  28. When Disaster strikes! • If something will go wrong .. It will .. • Ice Storms (1998 & 2013)/Hurricane Katrina/Sandy & other natural disasters • Toronto Simcoe Central Office fire July 1999 • Power plant failures • Hackers and viruses (SQL Worm) • September 11/terrorist attacks • All of these test the plans of a network provider. • Are contingency plans in place? Have they been tested or gathered dust for 5 years? • Is there an escalation chain of command? • Are there agreements with other suppliers/vendors/competitors? • What contingencies are in place to get critical services restored as quickly as possible • When service is lost, the prime objective, after immediate human safety, is the restoration of service

  29. From July 1999 … TORONTO - Phones stopped ringing in several major cities in Canada on Friday after an explosion caused a major system failure at a Bell Canada building in Toronto. The failure knocked out phone lines, most cell phones, internet services and bank machines in downtown Toronto. Cantel and digital cell phones appear to be working. Police report 911 emergency systems are working, but the police are urging people to use these systems only for real emergencies. The failure was caused by an explosion on the fourth floor at the downtown bell centre at around 8:00 am. One person was reportedly injured. Immediately after the explosion, battery powered backup systems kicked in. But they ran out of power a few hours later. The Toronto Stock Exchange is back up and running after it suspended trading briefly but brokerages are having trouble communicating. Phone systems in Ottawa and Montreal and as far away as Halifax and Vancouver have also been affected as calls that normally routed through Toronto are rerouted through other cities. Bell Canada says it hopes to have services restored by midafternoon. The Globe and Mail Published Thursday, Oct. 10 2013, 11:18 AM EDT Rogers Communications Inc. said a software glitch created a big spike in “signalling traffic” that caused one of the worst wireless network outages in the company’s history. Canada’s largest wireless carrier determined that root cause on Thursday roughly 18 hours after implementing a fix that restored voice and text services for customers across the country. DISASTERS CAN HAPPEN? How will your network provider handle the trouble?

  30. Another aspect of Network Management is Planning • A carrier will have a plan for a disaster situation, as well anticipating potential issues • Examples of planning for potential issues include • Y2K • more recently, the change in dates for Daylight Savings Time • Other various clock rollover issues • A carrier may also do periodic disaster simulations to test the response of various groups as well as procedures

  31. SPOC Function What is a SPOC? • In Bell Canada, the GNMC is the Single Point of Contact (SPOC) for all Fault Management and Change Management between Canadian Help Desks and Test Centres and all the global carriers that Bell uses to provide international reach for our customer circuits • SPOC for all other carriers to get their issues fixed within Canada • One door for all trouble management into or out of Canada • Avoids having many different groups learn the processes for dealing with each of the carriers, or the carriers having to learn about all the various ops centers within Canada • Provides flexibility to move quickly and customize for customer reasons, with centralized expertise • As a SPOC, we get to compare service levels provided by different global carriers and use this info to get better performance

  32. Operational Support Systems • Successful network management uses standardized protocols or vendor-specific mechanisms to transmit alarms and commands (e.g. Simple Network Management Protocol) • Operational control data can be transmitted over conventional data networks, over the same network (inband), or over another network (out of band). • The systems which receive alarms, allow for network configuration, troubleshooting, and control is commonly called Operational Support Systems (OSS). • OSS may be more than 10 times the cost of the network infrastructure! • OSSs may consist of Workstations, Databases, network elements, scripts, provisioning systems, security systems, offline databases and billing systems. • Without a good OSS structure, a great network infrastructure will fail. The network objectives cannot be met without this.

  33. Operational Support Systems • No one OSS does it all - if fact, many OSSs are required, and these must interact with each other. This is typically via Application Program Interfaces (API) or some standard format for information exchange. • The interaction can be simple - or complex. Often, simple format changes in one OSS will impact many other ‘downstream’ OSSs. • Remember where the money is spent - Not on the network infrastructure, but on the systems that make the network run. • The following diagram shows a SAMPLE interaction between various systems.

  34. Sample Operational Support Systems Fault Mgmt/ trouble shooting OSSs Test Centres, NDNC BILLING BILLING FILES BILLING FILES BILLING SYSTEM (Customer receives bill for service/usage) Call detail/ usage OSS Billing OSS BILLING RECS PROV ORDER INFO Recs ORDER ENTRY/ Assignment system Order system Network Provisioning system (Customer gets service) NETWORK Elements SNMP TRAPS Customer and assignment dumps (feed other OSSs) CUSTOMER Fault ORDERS PROV RECS Cust.. Stats Data Mgmt OSS SERVICE Trouble Collection Sys. ALERTS Ticket system ALERT DISPLAY Surveillance Centres Telco local assignment system Change Mgmt

  35. Metrics – Key Performance Indicators • Each network needs some means of measuring its success, and to see where improvement can be made. Public networks may be regulated. Metrics may be stipulated in Service level agreements (SLAs) between provider and customer • To the end user/customer, the most critical metrics are the following: • Mean time to repair (MTTR) • Network Availability ((Total available time-total downtime)/(Total avail. Time)) • Quality of Service (QOS) • round trip delay • Network congestion/blocking • frame/packet/cell loss • repeat failures • To the network provider, the following are important metrics: • Network Availability • EBITDA (Earnings Before Interest Taxes Depreciation & Amortization) • Cost / Revenue (return on investment) • Market Share • Network capacity

  36. Metrics • To the shareholder the following are important: • Dividend • share price • Return on Investment

  37. Summary • Networks can be simple, or extremely complex and mission critical • Network quality , reliability, diversity, and low cost are essential • The operation of a high quality reliable, cost effective network requires effective Network Management Centre(s), along with skilled people and good support tools (operational support systems) • As networks continue to evolve, customers will manage more and more of their own networks. • Challenges for the future include global coverage, scaling for growth, new technologies, telco mergers, acquisitions, failures - an industry always in flux.

More Related