Fitzgerald John

I’m a research academic based at Newcastle University http://www.csr.ncl.ac.uk. Our group has a history in fault tolerance and in formal methods (my own background is in proof-based versification and industry application of FMs). I’m here because of my involvement in the ReSIST (Resilience for Survivability in IST) project – a new EU FP6 network http://www.resist-noe.org, where I'm doing work on design support for fault tolerance. I've a history of working in and with the aerospace industry and the embedded processor market (dynamic binary translators).

There are just observations as we go along. I don’t pretend that they’re coherent or deep :-) Comments and discussion are very welcome - email john.fitzgerald at ncl.ac.uk.

---

There hasn’t been a lot of discussion of error propagation and containment.

We need glue operators (in Sifakis’ terminology) that contain errors in predictable ways, and we need to know the error propagation characteristics of the glue operators that are currently available. There are potential operators involving redundancy in architectural (diversity, NMR, wrapper technology etc.) and temporal domains (transaction-based structures).

Some of my recent experience suggests that there’s a shortage of useful theory and to help govern the selection of a resilience mechanism of the basis of some requirements.

---

Hermann Kopetz pointed out that the temporal predictability of his architectures is in part due to treating timing information as first class information. A challenge is to treat other (resilience-related) information in the same way. In ReSIST we’re using the term “resilience-related metadata” to cover this kind of information. Such metadata could range from component failure rates (and probability distributions) to information from enumerated types such as integrity levels. Resilience-related metadata is traded in the running system, especially a relatively open system, we enable reasoning in support of reconfiguration that provides a level of resilience, maintaining system level QoS in the face of degraded component behaviour.

It’s really interesting to hear of the use of semantic web technologies in this context – we’ve been looking at setting up semantic web representations for the metadata, so that you can treat it consistently (my “availability” = your “availability”).

---

Three cheers for Keijo Manninen's comment about systems and software engineers not sharing common understanding, the "Why do we need real time?" question. I can echo his experience from the aerospace sector. To what extent is this a technical issue as well as a management one? I guess we need to be able to close the gap between control-law based specifications and software implementations for managing evolution?

---

Proof that we are writng in real-time ... Stefan Kowaleski's memorable analaysis identifies the same problem and suggests at least methodological approaches. I agree strongly with the need for plant/environment modelling in requirements stages.

---

Seemed to be surprisingly little discussion of incremental certification, re-use of dependability cases, evidence etc. Incremental certification was mentioned quite a lot in the other sessions, but the certification talks were rather light on this.

---

I understand the point of view that a state-based formalism may seem inappropriate for applications with strong temporal properties - but there are surely many aspects of security that do require a state-based analysis - things like resource access control modelling, in which one wishes to define combinators on collections of acccess control rules.

Can't these aspects coexist? Surely process calculi can cohabit with state-based models? There's work on CSP+B, for example.

---