340 likes | 1.05k Views
Digital Modifications and Configuration Control of Digital Systems. John Connelly Exelon Generation Engineering Manager – Capital Projects. OPEX – Digital Challenges. Implementation of digital modifications is an industry wide issue:
E N D
Digital Modifications andConfiguration Control of Digital Systems John Connelly Exelon Generation Engineering Manager – Capital Projects
OPEX – Digital Challenges • Implementation of digital modifications is an industry wide issue: • IER 11-02 identifies adverse trend in SCRAMS between 2005 and 2010 • 43 SCRAMS (35%) were the result of flawed implementation of Design Changes involving digital technology • INPO 10-008 examined events from 2003 to 2007 • 17 SCRAMS from software malfunctions resulted in loss of 1.6 million MWh • 24 SCRAMS from hardware malfunctions resulted in a loss of 3.1 million MWh • Significant operational and safety challenges • A modest $50 / MWh yields an industry-wide cost of ~$200M
Common Threads Irrespective of utility, most events share two common themes: • Flaws in the processes by which digital modifications are implemented • Inadequate knowledge of the complex technologies and techniques common to nearly all digital modifications
Changes To INPO Evaluation Process - CM • Performance Objectives for the Design Change Process (CM.3) are under revision • Future INPO evaluations will include a review of the processes by which you manage the unique characteristics of digital technology. This includes: • Development and control of procurement specifications • Software • Vendor interfaces • Testing • Validation • Failure Modes and Effects Analysis
Changes To INPO Evaluation Process - Knowledge • Application of digital technology requires very different and specialized skills to implement correctly • INPO ACAD 98-04, Rev 2 introduces the entity of “Digital Engineer” • Engineers assigned to work independently on digital projects must be qualified to ACAD 98-04, Rev 2 by March 2013 • Training evaluations conducted after March of 2013 will be in accordance with the requirements of ACAD 98-04, Rev 2
Knowledge and Process Inventory • Digital technology, while superior in nearly every dimension to analog technology, requires very different competencies and processes: • Software engineering • Hardware design • Exception / Fault / Error Handling / Recovery • Networking • Cyber Security • Human Factors Engineering • Advanced analysis techniques (FMEA / SHA / CDR) • EMI / RFI • Interfacing systems knowledge • Plant Operations • Testing / Dynamic response analysis • Life-Cycle Management
Key Takeaway… • Digital Is Different! • Engineering processes for “conventional” modifications do not, by themselves, provide an adequate defense against errors and events • Requires very different skills to implement correctly • Your design processes will be evaluated against this reality
Exelon Internal OPEX • A series of events beginning in 2005 made it clear that improvement opportunities existed • The Quad Cities Reactor Recirculation Adjustable Speed Drives (ASD) provides a representative example of the challenges • Approximately 150 Issue Reports • Manual scram, power reductions and operational challenges • Principle findings from CCA: • Latent design flaws in vendor products • FMEA did not detect design issues • Excessive reliance on vendors • Testing failed to uncover issues • Similar experiences with other modifications 9
Redesigning the process at Exelon • Formed Corporate Capital Projects Group to oversee large, multi-site digital modifications (RRASD, DEH, MPT, TDFWP, BOP 7300…) • Staffed with subject matter experts on digital technology • CPG works closely with implementing engineers at the sites who manage the EC development process • Advanced training provided to site and corporate digital I&C engineers to jump start performance • Procedures and processes revised to capture best practices – process improvements will be continue indefinitely as practices continue to mature 10
Exelon Digital Modification Process • The existing Configuration Control process is now supplemented with procedures that address the unique attributes of digital technology • Management Of Digital Modifications • Digital Design Considerations • Design Attributes For Digital Systems • Software Development • Digital Procurement Process • Factory Acceptance Testing • Cyber Security • The process continues to evolve as Cyber Security requirements are implemented and additional best practices are identified 11
Procurement Specifications • The act of fully defining detailed vendor requirements commensurate with project safety significance, operational risk and project scope. • Specifically identifying documentation and performance requirements for a given project including (but not limited to): • Verification and Validation (V&V) requirements • Software Quality Assurance measures • Hardware design requirements (including Single Point Vulnerabilities) • Failure Modes and Effects Analysis (FMEA) requirements • Software testing and validation requirements • Cyber Security requirements • Life Cycle Management (LCM) requirements • Time invested in the development of a detailed procurement specification improves project execution by avoiding unbudgeted scope changes 14
Perfect Software Does Not Exist • No system will ever be perfect no matter how rigorous the development process used or amount of money spent to develop and maintain it – humans develop software and humans will always make mistakes • Highly automated systems effectively move the point of error from the user (Operations and Maintenance) to the programmer but human error still exists • The Space Shuttle flight control system was arguably the most rigorously developed and tested control system ever conceived • 400,000 words (very small footprint compared to a modern DCS) • $100,000,000 per year in maintenance • Over the 25 year shuttle program, 16 Severity Level 1 software issues were identified – SL1 issues are those that would result in the loss of the orbiter under the right conditions 16
How do software driven systems malfunction? • Software malfunctions are systemic, not random • In the absence of hardware induced fault, instructions will execute exactly as written unerringly and without exception • Software malfunctions require the simultaneous existence of two conditions: • An error must be present (often undetected) • An initiating event must occur • If both conditions are not satisfied, no error will occur 17
A Representative Example From Aerospace • The Event: • A completed commercial airliner is about to be delivered to the customer • A Factory Acceptance Test is being conducted by factory and customer personnel in which the parking brakes are applied and all four engines are taken to maximum continuous thrust • At this power setting and altitude (zero feet) the flight control system automatically selects “takeoff” mode as designed • The flight control system correctly recognizes that the wing surfaces are incorrectly configured for a takeoff and continuously sounds the Ground Proximity Warning (GPW) alarm as designed – this alarm is critical and cannot be silenced • A technician, irritated by the alarm and unable to silence it, trips the feed breaker for the GPW system knowing that this will de-energize the alarm • Ground proximity radar loses power and clears the zero altitude interlock • With the interlock cleared, control system now concludes the plane is in the air and releases the brakes – this is a programmed behavior to prevent landing the aircraft with the brakes set • Plane immediately accelerates (no passengers or luggage and little fuel) and strikes the jet blast barrier at full power 18
The results 19
The results 20
A Representative Example From Aerospace • Software malfunction requires two conditions: • The error must be present and undetected: • This application software had been in service for years and “ground run-up” tests are somewhat routine • The initiating event must occur: • The loss of supply voltage to GPW interlock caused the brakes to release exactly as they were programmed to do. • The software development team never envisioned this combination of events 21
Software evolves during the development process Changes can invalidate previous testing or introduce new errors 22
Integration With Cyber Security Requirements • The Cyber Security Rule (10 CFR 73.54) is a license condition that applies to any digital component that is: • Safety Related • Important To Safety (defined as reactivity impact) • Physical Security • Emergency Preparedness • Systems that support any of the above • Systems with pathways of connectivity to any of the above • Significant synergies exist between the Digital I&C process and Cyber Security • Consider the extent to which these processes are interconnected and aware of the other 24
Factory Acceptance Testing • Many test plans focus on “positive testing” which confirms expected responses for a given set of inputs or stimulus conditions – informative but only to a point • Negative testing focuses on verifying that you don’t get an unexpected response when you combine unusual stimulus or do something outside of normal operation – effectively its an attempt to trigger a malfunction which can be very informative • It’s nearly inevitable that over the life of a system, it will be operated in a way the designers never anticipated. Take advantage of unstructured testing opportunities (i.e. pre-FAT) to attempt to “break” the system early in the development cycle while there is ample opportunity to take corrective action for issues identified • Process needs to involve Operators, System Engineers and SME’s 27
Modification Acceptance Testing • Most modification issues are not with the systems themselves but rather interfaces to installed plant hardware (power / hydraulics / supporting systems / actuators / protective devices / EMI / RFI…) • The Mod Acceptance Test (MAT) is the very first time the system will be tested in the plant environment. In some cases it will be the first time that the system is connected to any physical components and therefore represent the first opportunity to identify and correct interface issues – care should be taken to exercise every interface to the extent possible and as early as possible • All models are wrong – this includes your plant simulator and vendor simulation models therefore in-plant testing is critical and your most robust line of defense 28
Ongoing Configuration Control • One of the advantages of digital systems is that they are easily modifiable – this also constitutes a vulnerability if not taken into consideration by the process • Processes need to exist to detect any inadvertent changes to a systems configuration • “Baseline / Compare” utilities can be used to compare system states with a known and approved baseline configuration • Periodic audits of log, system and event files • Surveillance testing • Defined protocols for testing of authorized modifications (i.e. regression testing) • Not all changes are modifications • Changes to calibration constants controlled in accordance with maintenance procedures • Pre-evaluated adjustments (tuning within defined boundaries) • Specific changes for Cyber Security incident response in accordance with CS procedures • Reference EPRI Topical Report 1022991 – “Guideline On Configuration Management For Digital Instrumentation And Control Equipment And Systems” 30
Questions? 31