Fault Tolerance and Security

Fault Tolerance and Security Geraint Price Information Security Group Royal Holloway

Outline • Introduction • Background • Security • Fault Tolerance • Major Contributions • A Personal Perspective • Future Challenges • Conclusions Security and Protection of Information 2005

Introduction • Computer Security and Fault Tolerance share a subset of goals • The ability to tolerate or mitigate failure in a computer system • The assumptions that underpin traditional solutions make their merger non-trivial • Security: Remove any replication and tighten control • Fault Tolerance: Replicate and compare results Security and Protection of Information 2005

Introduction – II • Recent cross-over research began with Reiter’s work on Rampart (mid 90s) • Spawned a new interest in the application of fault tolerant mechanisms in security: • Tacoma: Provision of replication for mobile agents • MAFTIA: A large-scale project to study survivability in Internet applications • We concentrate on two avenues of research: • Development of the fault model • Progression of the replication mechanisms Security and Protection of Information 2005

Background – Security • Why the relatively late interaction? • In our opinion, it has much to do with the history of computer security: • Trusted Computing Base • Research was weighted towards confidentiality and integrity – not availability • Others had noted this gap in the computer security literature [Needham,’94] Security and Protection of Information 2005

Background – Security – II • Very little in the open literature that dealt with Denial of Service (the absence of availability) • A notable exception [Gligor, ‘86]: • An increase in Maximum Waiting Time (MWT) • Legitimate and other forms of denial of service – system returns before MWT • Interesting exception [Turn and Habibi, ‘86]: • A security function is fault tolerant, if given the presence of a fault, the system’s security policy remains intact Security and Protection of Information 2005

Background – Fault Tolerance • Fault Modelling: • Fault → Error → Failure • Fault: Adjudged or hypothesized cause of error • Error: The part of the system that may lead to failure • Failure: Service deviates from specification • Four techniques within the dependability paradigm: • Fault prevention, fault tolerance, fault removal, fault forecasting Security and Protection of Information 2005

Background – Fault Tolerance – II • Replication Mechanisms: • Underlying group communication mechanisms • Early work conducted at Cornell University: • Isis toolkit: CBCAST (Causal broadcast), ABCAST (Atomic broadcast) • Group Structures: • State Machine Approach: Active replication, which masks the failure of a proportion of the servers • Primary Backup Approach: Passive replication, if the primary fails, then a backup takes over Security and Protection of Information 2005

Major Contributions • Rampart • Castro and Liskov • Quorum Systems • MAFTIA • Tacoma • Other Projects Security and Protection of Information 2005

Rampart • Group communication implemented by Reiter [Reiter, ’94 & ‘96] • First system to implement replicated service based on Byzantine agreement protocols • Main communication structure derived from the earlier work on Isis at Cornell • Extension over the Isis work through its ability to tolerate the malicious failure of a proportion of the servers within the group Security and Protection of Information 2005

Rampart – II • Choices over communication primitives within Rampart: • State machine approach to replication • Digital signatures to provide message authentication in group communication primitive • Lack of efficiency and scalability • Although it has its drawbacks, it inspired the majority of the remaining work • The main research agenda as a result was the search for more efficient protocols Security and Protection of Information 2005

Castro & Liskov • A new replication mechanism to overcome efficiency concerns [Castro & Liskov, ‘99] • Two main differences to Rampart: • Primary backup model • Pair-wise symmetric key Message Authentication Codes • A test implementation over NFS was only 3% slower than Digital Unix NFS • Efficiency gains are due to optimistic protocols under normal operation Security and Protection of Information 2005

Quorum Systems • Data replication in a group of servers [Malkhi & Reiter, ‘97] • Move away from the state machine approach • Increase scalability by removing the server-to-server communication for a read operation • However, their work does require server-to-server communication for state update, and hence a write operation Security and Protection of Information 2005

MAFTIA • Malicious and Accidental Fault Tolerance for Internet Applications • Large EU funded project: • 6 partners • Expertise in fault tolerance, distributed computing, cryptography, formal verification and intrusion detection • 3 main areas of work: conceptual framework and architecture; mechanisms and protocols; formal verification and assessment Security and Protection of Information 2005

MAFTIA – Conceptual Model • Extension of the Fault → Error → Failure model • Re-defining a Fault as an Intrusion: • Intrusion: A malicious, externally-induced fault resulting from an attack that has been successful in exploiting a vulnerability • Attack: A malicious interaction fault, through which an attacker aims to deliberately violate one or more security properties • Vulnerability: A fault created during development of the system, or during operation, that could be exploited to create an intrusion Security and Protection of Information 2005

MAFTIA – Conceptual Model – II • In breaking down an Intrusion, they highlight the possibility of targeting the removing or preventing of both Attacks and Vulnerabilities • Although MAFTIA’s main focus was Intrusion Tolerance, they classify a whole range of security mechanisms according to the fault prevention, tolerance, removal and forecasting paradigms mentioned earlier Security and Protection of Information 2005

MAFTIA – Hybrid Failure Model • Composite fault model with a hybrid failure assumption • The presence and severity of vulnerabilities, attacks and intrusions varies from component to component • Assumptions present in their architectural design: • Built on top of trustworthy components: • Java Card • Trusted Timely Computing Base (TTCB) • Trusted Middleware component Security and Protection of Information 2005

MAFTIA – Hybrid Failure Model – II • The key element of the MAFTIA architecture is the TTCB: • Provision of time based services through the use of a Control Channel • Dedicated and heavily protected security kernel – fail silent rather than arbitrary failure • Implementation of a reliable broadcast protocol that can tolerate up to f of f+2 failures [Correia et al., ‘02 ] Security and Protection of Information 2005

Tacoma • Tromso And COrnell Moving Agents project • Provision of security and fault tolerance were two key elements • Resilience for the agent on a potentially malicious host: • Replicated agents, with voting mechanisms • Fault tolerance for mobile agents: • Extension of the primary backup approach • “… preserving the necessary consistency between replicas can be done efficiently only within a local-area network” Security and Protection of Information 2005

Other Projects • COCA: • Replication of a CA to provide availability • Byzantine quorum systems • Proactive recovery • OASIS (Organically Assured and Survivable Information Systems) • Umbrella project which sponsors separate work items in the field of resilient security Security and Protection of Information 2005

A Personal Perspective • Control of Execution: • Adapting fault tolerant principles for a secure environment can come down to a principle of control • In the Fault → Error → Failure model, breaking the chain requires retaining control • Whose security policy are we protecting? • Proposed mechanisms for allowing a client to share that control [Price, ‘99] Security and Protection of Information 2005

A Personal Perspective – II • Use of Other Mechanisms: • Some of our previous work identified the possibility of using timing checks [Price, ’01] • Remove the attacker’s ability to delay or replay messages with impunity • Some variants of replay attacks rely on this • With hindsight, there is an interesting comparison with MAFTIA’s use of a Control Channel Security and Protection of Information 2005

Future Challenges • Relaxation of assumptions: • Fully Byzantine failure models are difficult to protect against – and hence solutions are inefficient • Most of the work since Rampart have concentrated on feasible means of relaxing these failure assumptions: can we do better? • Further use of hardware: • MAFTIA’s use of trusted hardware allows for more efficient protocols – can the principle be generalised? • Mixed failure environments [Siu et al., ‘98] • Trusted Computing Group Security and Protection of Information 2005

Future Challenges – II • Other dependability models: • Fault tolerance is only part of a very mature dependability literature • Disjoint v Inclusive error recovery? • MAFTIA defined a whole classification within their model • Security service classification: • Quorum based systems use the parallelism of a read operation to increase efficiency • Can we class different services according to their communication requirements? Security and Protection of Information 2005

Conclusions • Until 10 years ago, the work in this field was sparse and sporadic • Now there is a large body of work in this area • Practical efficiency is still a key research topic • Broaden our search for other applicable mechanisms • Availability and survivability on the Internet is only going to become more important Security and Protection of Information 2005

Fault Tolerance and Security

Fault Tolerance and Security

Presentation Transcript

Fault Tolerance

Fault Tolerance

Fault Tolerance

Fault Tolerance

Fault tolerance

Fault tolerance

Fault Tolerance

Fault Tolerance

Fault Tolerance

Fault Tolerance

Fault Tolerance

Fault Tolerance

Fault Tolerance

Fault Tolerance

Fault Tolerance

Fault Tolerance

Fault Tolerance

Fault Tolerance

Fault Tolerance

Fault Tolerance

Fault Tolerance

Fault Tolerance