Enhancing Security and Availability in Modern Distributed Systems

Modern Distributed Systems Design – Security and High Availability Measuring Availability Highly Available Data Management Redundant System Design

Measuring Availability • How resiliency and high availability are interconnected? • Define downtime and what causing downtime. • How to meager availability?

Measuring Availability

Define Downtime • Downtime could be defined by following: “If a user cannot get his job done on time, the system is down”

What causing downtime? • Planned – ones that easiest to reduce that include scheduled system maintenance, hot-swappable hard drives, cluster upgrades and even failovers. Usually 30% of all downtime; • People or human factor – dumb mistakes and complex innovation in IT equipment, software and protocols requires greater knowledge of engineers. Usually 15 % of all downtime; • Software Failures - due to software bugs and viruses. (40%)

How to meager availability? MTBF Availability = ---------------------, where MTBF + MTTR MTBF – “mean time between failures” and MTTR - “maximum time to repair”

What can go wrong? • Hardware • Environmental and Physical Failures • Network Failures • Database System Failures • Web Server Failures • File and Print Server Failures

The Cost of Downtime.

Levels of Availability: • Regular Availability • Increased Availability • High Availability • Disaster recovery • Fault-Tolerant System

Highly Available Data Management • Data management is the most sensitive area of modern distributed systems. • Quick overview of existing data topologies

Redundant System Design • Redundant storage (RAID, Multi-hosting, Multi-Pathing, DiskArray, JBOD, etc) • Failover Configurations and Management • Introduction to SAN and Fibre Channel protocol • Security aspects of data management in Storage Area Networks

Redundant storage

Redundant Storage (RAID 5)

Failover Configurations and Management Failover must meet following requirements: • Transparent to client; • Quick (no more then 5 min, ideally 0-2 min); • Minimal manual intervention, guaranteed data access.

Failover components: • Two servers, one primary another takeover; • Two network connections, third is highly recommended • All disks on a failover pair should have some sort of redundancy • Application portability • No single point of failure.

Symmetric Failover

Asymmetric Failover

Fibre Channel, SAN, IP Storage

Security in IP Storage Networks • Security in Fibre Channel SANs • Security Options for IP Storage Networks

Fibre Channel SAN Security • Port or hard zoning • WWN Zoning • LUN Masking

Security Options for IP Storage Networks • iSNS • LUN Masking as in Fibre Channel and VLAN tagging • IP Security or IPSec • ACL

Enhancing Security and Availability in Modern Distributed Systems

Enhancing Security and Availability in Modern Distributed Systems

Presentation Transcript

Measuring Migration: Best Practices in Censuses and Household Surveys

Measuring

Availability of IP/MPLS networks

CAT calls (Changed Availability or Type)

Measuring

Measuring International Trade in Services

Science Measuring Tools by: Kathy Furgang

Overview of high availability in Microsoft SQL Server

Availability Performance of LCLS X-Ray FEL at SLAC

Availability Assessment

Measuring IP Performance

High Availability Design Ram Dantu

Availability Manager V3.0-2 Overview

Availability How to complete your Availability section

FM Measuring or Measuring FM ?

E148 Achieving 24x7 Availability

Measuring

High Availability in Clustered Multimedia Servers

From Measuring Production to measuring well-being

Lesson 3 : (1.5 ) Measuring Segments (1.6) Measuring Angles

Always on HA