Dependability and Security

Failures: a basic issue in dependability and security - Their definition, modelling & analysisBrian Randell

Dependability and Security • There are severe terminological and conceptual confusions in the dependability & security field(s). These come into prominence when one takes an adequately general view of dependability & security problems – by avoiding the (naive) assumptions that systems always have well-established boundaries and fully-adequate specifications. • But the confusion also concerns, for example, the relationship of dependability and security – so let’s dispose of that first: • Until recently the IEEE/WG10.4 dependability community’s definitions regarded dependability as subsuming security. • The latest version of the dependability concepts and terminology saga has (pragmatically) suggested somewhat of a differentiation, which in effect associates dependability with an emphasis on accidental faults and security on malicious faults. (In fact what is almost always needed is their combination!)

Dependability “versus” Security Availability Reliability Safety Confidentiality Integrity Maintainability Security Dependability Basic Concepts and Taxonomy of Dependable and Secure Computing, Avizienis, A., Laprie, J.-C., Randell, B. and Landwehr, C. IEEE Transactions on Dependable and Secure Computing, Vol. 1, No. 1, pp 11-33, 2004

On Failures • To me, failures are the central issue, the most basic concept - regardless of the exact relationship that is deemed to exist between security and dependability – a topic I will ignore. • Particular types of failures (e.g. producing wrong results, ceasing to operate, revealing secret information, causing loss of life, etc.) relate to what can be regarded as different attributes of dependability/security: reliability, availability, confidentiality, safety,etc. • Complex real systems, made up of and by other systems (e.g. of hardware, software and people) do actually fail from time to time (!), and reducing the frequency and severity of their failures is the major challenge - common to both the dependability and the security communities. • My preferred definition: a dependable/secure system is one whose (dependability/security) failures are not unacceptably frequent or severe (from some given viewpoint). • So what is a failure?

Three Basic Concepts • From Avizienis et al: • A system failure occurs when the delivered service deviates from fulfilling the system function, the latter being what the system is aimed at. (I’ll return to this definition.) • An error is that part of the system state which is liable to lead to subsequent failure: an error affecting the service is an indication that a failure occurs or has occurred. The adjudged or hypothesised cause of an error is a fault. • (Note: errors do not necessarily lead to failures – this may be avoided by chance or design; component failures do not necessarily constitute faults to the surrounding system – this depends on how the surrounding system is relying on the component). • These three concepts (an event, a state, and a cause) must be distinguished, whatever names you choose to use for them. • Identifying failures and errors, as well as faults, involves judgement.

The Failure/Fault/Error “Chain” • A failure occurs when an error “passes through” the system-user interface and affects the service delivered by the system – a system of course being composed of components which are themselves systems. This failure may be significant, and thus constitute a fault, to the enclosing system. Thus the manifestation of failures, faults and errors follows a “fundamental chain”: . . . failure  fault  error  failure  fault . . . i.e. . . . event cause state event cause . . . • This chain can flow from one system to: • another system that it is interacting with. • the system which it is part of. • a system which it creates or sustains. • Typically, a failure will be judged to be due to multiple co-incident faults, e.g. the activity of a hacker exploiting a bug left by a programmer.

System Failures • Identifying failures (and hence errors and faults), even understanding the concepts, is difficult when: • there can be uncertainties about system boundaries. • the very complexity of the systems (and of any specifications) is often a major difficulty. • the determination of possible causes or consequences of failure can be a very subtle, and iterative, process. • any provisions for preventing faults from causing failures may themselves be fallible. • Attempting to enumerate a system’s possible failures beforehand is normally impracticable. • Instead, one can appeal to the notion of a “judgemental system”.

Systems Come in Threes! • The “environment” of a system is the wider system that it affects (by its correct functioning, and by its failures), and is affected by. • What constitutes correct (failure-free) functioning might be implied by a system specification – assuming that this exists, and is complete, accurate and agreed. (Often the specification is part of the problem!) • However, in principle a third system, a judgemental system, is involved in determining whether any particular activity (or inactivity) of a system in a given environment constitutes or would constitute –from its viewpoint– a failure. • The judgemental system and the environmental system might be one and the same, and the judgement might be instant or delayed. • The judgemental system might itself fail – as judged by some yet higher system – and different judges, or the same judge at different times, might come to different judgements.

Judgemental Systems • This term is deliberately broad – it covers from on-line failure detector circuits, via someone equipped with a system specification, to the retrospective activities of a court of enquiry (just as the term “system” is meant to range from simple hardware devices to complex computer-based systems, composed of h/w, s/w & people). • Thus the judging activity may be clear-cut and automatic, or essentially subjective – though even in the latter case a degree of predictability is essential, otherwise the system designers’ task would be impossible. • The judgement is an action by a system, and so can in principle fail – either positively or negatively. • This possibility is allowed for in the legal system, hence the concept of a hierarchy of crown courts, appeal courts, supreme courts, etc. • As appropriate, judgemental systems should use evidence concerning the alleged failure, any prior contractual agreements and system specifications, certification records, government guidelines, advice from regulators, prior practice, common sense, etc., etc.

A Role for (Formal) Modelling? • If one could express even some of these ideas in a formal notation, this might facilitate: • the analysis of system failures. • the analysis and design of (fault-tolerant) systems themselves. • The notation I’ve been experimenting with is that of Occurrence Nets (aka Causal Nets, Occurrence Graphs, etc.). • ONs represent what (allegedly) happened, or might happen, and why, in a system – they model system behaviour, not actual systems. • Simple nets can be shown pictorially. • They can be expressed algebraically, and have a formal semantics. • Tools exist for their analysis and manipulation – and even for synthesizing systems from them (in simple cases). • My thought experiments have concerned what might be called “Structured Occurrence Nets”. • Their structure results from notions like the fundamental F-E-F chain. • This structure provides significant complexity reduction, and so could facilitate (automated) failure analyses, or possibly even system synthesis.

(Structured) Occurrence Net Notation The (simple and perhaps new) idea is to introduce various types of (formal) relations between Occurrence Nets (which are an old idea), and treat a set of such related ONs as a “Structured ON”.

An Aside –Concept Minimization • The Avizienis et al dependability and security definitions aim to identify and define a minimum set of basic concepts (nouns), such as “fault”, “error” and “failure” – and then elaborate on these using adjectives, such as “transient”, “internal”, “catastrophic”, etc. • One major problem in such definitional efforts is to avoid circular definitions, and to minimise the base of pre-existing definitions used. • (We used to use the word “reliance” in a sole definition of “dependability” – a regrettable near-circularity.) • This we achieved using as a basic starting point just three conventional dictionary definitions – for “system”, “judgement” and “state”. • My recent work on Occurrence Nets was sparked by the belated realisation that the concepts of “system” and “state” were not separate, but just a question of abstraction. • In fact Occurrence Nets can represent both systems and states using the same symbol – a “place”.

Systems & their Behaviour The markings in the places in the lower ON are in effect “colourings” which identify the system concerned – here they thus relate states to systems.

System Updating The relations between the upper and lower nets show which state sequences are associated with the systems before they were modified, and which with the modified systems. (This is off-line system modification.)

On-Line System Evolution This shows the history of an online modification of some systems, in which the modified systems continue on from the states that had been reached by the original systems.

System Creation & Evolution This shows some of the earlier history of the two systems, i.e. that system 1 created system 2, and that both then went through some independent further evolution.

System Composition This shows the behaviour of a system and of its three component systems, and how this behaviour is related to that of its components. It portrays what is in effect a spatial abstraction – one can also have a temporal abstraction, via the “abbreviation” relation.

Abbreviation  “Abbreviating” parts of an ON in effect defines atomic actions, i.e. actions that appear to be instantaneous to their environment. The rules that enable one to make such abbreviations are non-trivial when there are concurrent activities.

The AtomicityProblem • Occurrence Nets represent causality, so must be acyclic. • (a) shows a valid collapsing – i.e. a part of the ON that can be regarded as atomic and hence replaceable by a single event box. • (b) shows that if a similar collapsing is also applied to the other part of the ON a cycle is introduced. • Thus it is not possible to treat both parts of this ON as being simultaneously atomic.

Hardware and Software The relationship between hardware and the software processes running on it, and indeed between the electricity source that powers the hardware, and these software processes, seem to be the same – each could be characterized as “provides means for”, or “allows to exist/occur”.

System Infrastructure This shows the relation between the computer and electrical systems and the behaviour of the software. Such “infrastructural” relations tend to be ignored, at least until something goes wrong. For example, the computer designer cannot design any mechanisms that will function in the absence of electricity, or the software designer any for a computer that cannot obey instruction sequences.

State Retention, e.g. for Fault Tolerance To allow for the possibility of failure a system might for example make use of “recovery points”. Such recovery points can be recorded in places that take no further (direct) part in the system’s ongoing behavior, as portrayed above.

(Post Hoc) Judgement Here a place in the judgemental system’s ON is shown as holding a representation of an ON of the system being judged.

Failure Analysis • Structured ONs could be used to represent actual or assumed past behaviour, or possible future behaviour, and to record F-E-F chains between systems. • They could be generated and recorded (semi?)automatically – alternatively they might need to be generated retrospectively, from whatever evidence and testimony is available. • Analysis of a Structured ON typically involves following (possibly in both directions) causal arrows within ONs, and relations between ONs. • Such analysis is of course limited by the accuracy and the completeness of the Structured ON – and might be interspersed with efforts at validating and enhancing the Structured ON. • The envisaged forms of structuring have various potential benefits: • They allow fairly direct representation of what happens in various complex situations, such as dynamic system evolution, infrastructure failures, etc. • They provide “divide-and-conquer”-style complexity reduction, compared to the use of (one-level) occurrence nets.

Concluding Remarks • Possibilities for taking advantage of the complexity-reduction provided by a SON’s structuring would seem to include: • A judgemental system, having identified some system event as a failure, could analyze records forming a Structured ON in an attempt to identify (i) the fault(s) that should be blamed for the failure, and/or (ii) the erroneous states that could and should be corrected or compensated for. (This could be viewed as a way of describing (semi)formally what is often currently done by expert investigators in the aftermath of a major system failure.) • Structured ONs might be usable for modelling complex system behaviour prior to system deployment, so as to facilitate the use of some form of automated model-checking in order to verify at least some aspects of the design of the system(s). • In principle (again largely using existing tools) one could even synthesize a system from such a checked Structured ON. • Next steps (with/by Maciej Koutny) – completion of formal definitions of these SON concepts, and (I hope) exploitation of his work on model checking of occurrence net based specifications

Some References • Best, E. and Randell, B. (1981). A Formal Model of Atomicity in Asynchronous Systems, Acta Informatica, Vol. 16 (1981), pp 93-124. Springer-Verlag Germany.http://www.cs.ncl.ac.uk/research/pubs/articles/papers/397.pdf • Grahlmann, P and Best, E: PEP - More than a Petri net tool. Proc. of TACAS'96, LNCS 1055, 1996, pp.397-401 [PEP] • Holt, A.W., Shapiro, R.M., Saint, H., and Marshall, S., “Information System Theory Project”, Appl. Data Research ADR 6606 (US Air Force, Rome Air Development Center RADC-TR-68-305), 1968. • Khomenko, V. and Koutny, M.: Branching Processes of High-Level Petri Nets, Proc. of TACAS'03, LNCS 2619, 2003 pp.458-472, [PUNF] http://www.cs.ncl.ac.uk/research/pubs/articles/papers/425.pdf • Merlin, P.M. and Randell, B. State Restoration in Distributed Systems, In Proc FTCS-8, Toulouse, France, 21-23 June 1978 pp. 129-134. IEEE Computer Society Press 1978 http://www.cs.ncl.ac.uk/research/pubs/articles/papers/347.pdf

Dependability and Security

Dependability and Security

Presentation Transcript

Dependability

Ability and Dependability

CRT Dependability

SECURITY AND DEPENDABILITY OF COMPUTER SYSTEMS

Chapter 15 Dependability and Security Assurance

Chapter 12 – Dependability and Security Specification

Chapter 11 – Security and Dependability

UML and Dependability Analysis

SECURITY AND DEPENDABILITY OF COMPUTER SYSTEMS

Chapter 11 – Security and Dependability

Dependability

Dependability and real-time

Dependability

Chapter 11 – Security and Dependability

Dependability and Security : Positioning*

Dependability and real-time

Adaptation and Dependability

DESEREC: Dependability and Security by Enhanced Reconfigurability

Dependability evaluation

Economics of Dependability and Security

WP7: Internet as Critical Infrastructure; Security, Resilience and Dependability

Security in the Context of Dependability