1 / 25

Fault Tolerance and Security

Fault Tolerance and Security. Geraint Price Information Security Group Royal Holloway. Outline. Introduction Background Security Fault Tolerance Major Contributions A Personal Perspective Future Challenges Conclusions. Introduction.

Download Presentation

Fault Tolerance and Security

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fault Tolerance and Security Geraint Price Information Security Group Royal Holloway

  2. Outline • Introduction • Background • Security • Fault Tolerance • Major Contributions • A Personal Perspective • Future Challenges • Conclusions Security and Protection of Information 2005

  3. Introduction • Computer Security and Fault Tolerance share a subset of goals • The ability to tolerate or mitigate failure in a computer system • The assumptions that underpin traditional solutions make their merger non-trivial • Security: Remove any replication and tighten control • Fault Tolerance: Replicate and compare results Security and Protection of Information 2005

  4. Introduction – II • Recent cross-over research began with Reiter’s work on Rampart (mid 90s) • Spawned a new interest in the application of fault tolerant mechanisms in security: • Tacoma: Provision of replication for mobile agents • MAFTIA: A large-scale project to study survivability in Internet applications • We concentrate on two avenues of research: • Development of the fault model • Progression of the replication mechanisms Security and Protection of Information 2005

  5. Background – Security • Why the relatively late interaction? • In our opinion, it has much to do with the history of computer security: • Trusted Computing Base • Research was weighted towards confidentiality and integrity – not availability • Others had noted this gap in the computer security literature [Needham,’94] Security and Protection of Information 2005

  6. Background – Security – II • Very little in the open literature that dealt with Denial of Service (the absence of availability) • A notable exception [Gligor, ‘86]: • An increase in Maximum Waiting Time (MWT) • Legitimate and other forms of denial of service – system returns before MWT • Interesting exception [Turn and Habibi, ‘86]: • A security function is fault tolerant, if given the presence of a fault, the system’s security policy remains intact Security and Protection of Information 2005

  7. Background – Fault Tolerance • Fault Modelling: • Fault → Error → Failure • Fault: Adjudged or hypothesized cause of error • Error: The part of the system that may lead to failure • Failure: Service deviates from specification • Four techniques within the dependability paradigm: • Fault prevention, fault tolerance, fault removal, fault forecasting Security and Protection of Information 2005

  8. Background – Fault Tolerance – II • Replication Mechanisms: • Underlying group communication mechanisms • Early work conducted at Cornell University: • Isis toolkit: CBCAST (Causal broadcast), ABCAST (Atomic broadcast) • Group Structures: • State Machine Approach: Active replication, which masks the failure of a proportion of the servers • Primary Backup Approach: Passive replication, if the primary fails, then a backup takes over Security and Protection of Information 2005

  9. Major Contributions • Rampart • Castro and Liskov • Quorum Systems • MAFTIA • Tacoma • Other Projects Security and Protection of Information 2005

  10. Rampart • Group communication implemented by Reiter [Reiter, ’94 & ‘96] • First system to implement replicated service based on Byzantine agreement protocols • Main communication structure derived from the earlier work on Isis at Cornell • Extension over the Isis work through its ability to tolerate the malicious failure of a proportion of the servers within the group Security and Protection of Information 2005

  11. Rampart – II • Choices over communication primitives within Rampart: • State machine approach to replication • Digital signatures to provide message authentication in group communication primitive • Lack of efficiency and scalability • Although it has its drawbacks, it inspired the majority of the remaining work • The main research agenda as a result was the search for more efficient protocols Security and Protection of Information 2005

  12. Castro & Liskov • A new replication mechanism to overcome efficiency concerns [Castro & Liskov, ‘99] • Two main differences to Rampart: • Primary backup model • Pair-wise symmetric key Message Authentication Codes • A test implementation over NFS was only 3% slower than Digital Unix NFS • Efficiency gains are due to optimistic protocols under normal operation Security and Protection of Information 2005

  13. Quorum Systems • Data replication in a group of servers [Malkhi & Reiter, ‘97] • Move away from the state machine approach • Increase scalability by removing the server-to-server communication for a read operation • However, their work does require server-to-server communication for state update, and hence a write operation Security and Protection of Information 2005

  14. MAFTIA • Malicious and Accidental Fault Tolerance for Internet Applications • Large EU funded project: • 6 partners • Expertise in fault tolerance, distributed computing, cryptography, formal verification and intrusion detection • 3 main areas of work: conceptual framework and architecture; mechanisms and protocols; formal verification and assessment Security and Protection of Information 2005

  15. MAFTIA – Conceptual Model • Extension of the Fault → Error → Failure model • Re-defining a Fault as an Intrusion: • Intrusion: A malicious, externally-induced fault resulting from an attack that has been successful in exploiting a vulnerability • Attack: A malicious interaction fault, through which an attacker aims to deliberately violate one or more security properties • Vulnerability: A fault created during development of the system, or during operation, that could be exploited to create an intrusion Security and Protection of Information 2005

  16. MAFTIA – Conceptual Model – II • In breaking down an Intrusion, they highlight the possibility of targeting the removing or preventing of both Attacks and Vulnerabilities • Although MAFTIA’s main focus was Intrusion Tolerance, they classify a whole range of security mechanisms according to the fault prevention, tolerance, removal and forecasting paradigms mentioned earlier Security and Protection of Information 2005

  17. MAFTIA – Hybrid Failure Model • Composite fault model with a hybrid failure assumption • The presence and severity of vulnerabilities, attacks and intrusions varies from component to component • Assumptions present in their architectural design: • Built on top of trustworthy components: • Java Card • Trusted Timely Computing Base (TTCB) • Trusted Middleware component Security and Protection of Information 2005

  18. MAFTIA – Hybrid Failure Model – II • The key element of the MAFTIA architecture is the TTCB: • Provision of time based services through the use of a Control Channel • Dedicated and heavily protected security kernel – fail silent rather than arbitrary failure • Implementation of a reliable broadcast protocol that can tolerate up to f of f+2 failures [Correia et al., ‘02 ] Security and Protection of Information 2005

  19. Tacoma • Tromso And COrnell Moving Agents project • Provision of security and fault tolerance were two key elements • Resilience for the agent on a potentially malicious host: • Replicated agents, with voting mechanisms • Fault tolerance for mobile agents: • Extension of the primary backup approach • “… preserving the necessary consistency between replicas can be done efficiently only within a local-area network” Security and Protection of Information 2005

  20. Other Projects • COCA: • Replication of a CA to provide availability • Byzantine quorum systems • Proactive recovery • OASIS (Organically Assured and Survivable Information Systems) • Umbrella project which sponsors separate work items in the field of resilient security Security and Protection of Information 2005

  21. A Personal Perspective • Control of Execution: • Adapting fault tolerant principles for a secure environment can come down to a principle of control • In the Fault → Error → Failure model, breaking the chain requires retaining control • Whose security policy are we protecting? • Proposed mechanisms for allowing a client to share that control [Price, ‘99] Security and Protection of Information 2005

  22. A Personal Perspective – II • Use of Other Mechanisms: • Some of our previous work identified the possibility of using timing checks [Price, ’01] • Remove the attacker’s ability to delay or replay messages with impunity • Some variants of replay attacks rely on this • With hindsight, there is an interesting comparison with MAFTIA’s use of a Control Channel Security and Protection of Information 2005

  23. Future Challenges • Relaxation of assumptions: • Fully Byzantine failure models are difficult to protect against – and hence solutions are inefficient • Most of the work since Rampart have concentrated on feasible means of relaxing these failure assumptions: can we do better? • Further use of hardware: • MAFTIA’s use of trusted hardware allows for more efficient protocols – can the principle be generalised? • Mixed failure environments [Siu et al., ‘98] • Trusted Computing Group Security and Protection of Information 2005

  24. Future Challenges – II • Other dependability models: • Fault tolerance is only part of a very mature dependability literature • Disjoint v Inclusive error recovery? • MAFTIA defined a whole classification within their model • Security service classification: • Quorum based systems use the parallelism of a read operation to increase efficiency • Can we class different services according to their communication requirements? Security and Protection of Information 2005

  25. Conclusions • Until 10 years ago, the work in this field was sparse and sporadic • Now there is a large body of work in this area • Practical efficiency is still a key research topic • Broaden our search for other applicable mechanisms • Availability and survivability on the Internet is only going to become more important Security and Protection of Information 2005

More Related