Cyber-physical security of smart grid:Threats, countermeasures and risk assessment Yee Wei Law (罗裔纬) ARC Research Network on Intelligent Sensors, Sensor Networks and Information Processing (ISSNIP) The University of Melbourne
Related talks • A. Ginter, “Smart Grid Security Guest Lecture: CPSC 529 - Information and Network Security,” University of Calgary, 2011. • Sense of Security, “Securing the Smart Grid,” Smart Electricity World Conference, Melbourne, 2011. • P. Will, “IT and the Smart Grid,” USC Information Sciences Institute, 2010.
Corporate network Industrial control system (SCADA) Cyber processes Physical processes Critical infrastructures Electricity grid Gas distribution network Sewage system Dams Telecommunications Hospitals Lighthouses Rail roads Cyber-physical security: study of the impact of cyber attacks on the physical processes of a control system, and the prevention or mitigation of these attacks
Security headlines 2008 2009 2010 2011 2012 4
Security trends 2011 CyberSecurity Watch Survey by CERT (Aug 2009 – Jul 2010) 21% threats from insiders (employees or contractors) 58% threats from outsiders (e.g., hackers) 55% say SCADA / operational control systems targeted most often Fact 1: Insider attacks render cryptographic protection inadequate Fact 2: Control systems are prime targets
Cyber warfare programs • United States Cyber Command (USCYBERCOM) embraces the philosophy of “active defense” • North Korea’s No. 91 Office • South Korea teamed up with Korea University to establish a cyber-defense school • From Wikipedia: the world’s largest hackers’ school is the military school of FAPSI (Federal Agency of Government Communications and Information) in Voronezh, i.e., Voronezh Military Aviation Engineering University, Russia • China’s Blue Army FAPSI’s coat of arms
Agenda • From power grid to smart grid • Smart grid vs sensor network: conceptual similarities and differences • Smart grid standards and guidelines • Risk assessment （风险评估） • Wide-area measurement system • Threats and countermeasures • State estimation • Threats and countermeasures • Automatic Generation Control (AGC) • Threats and countermeasures
From power grid to smart grid Australian Standard: AS 60038-2000 “Standard voltages”: Transmission EHV: 275kV, 330kV, 500 kV HV: 220kV MV: 66kV Distribution LV: 11kV, 22kV Smart Grid: The integration of power, communications, and information technologies for an improved electric power infrastructure serving loads while providing for an ongoing evolution of end-use applications.
Why smart (electricity) grid? • Two main drivers: (1) sustainable generation, (2) sustainable return on investment in infrastructure Motivates demand response Resilient to failures, disasters, attacks Resource-efficient Quality-focused Accommodates distributed generation Steve Jetson, “Smart meters – helping industry save money by using energy efficiently,” Sustainability and Technology Forum, 2011.
Backup generation Grid 2 Vehicle / Vehicle to Grid Grid appliances Distributed storage SCADA T&D Automation Load limiting Fault prediction Micro-grid Load control Energy mgmt systems Solar monitoring & dispatch Distributed generation Outage management Interval billing Prepay In home displays Power quality management Generation / supply T&D Usage / demand Servers Data storage Web presentment Transactions Modeling Smart agents Intelligence Fiber/MPL RF Mesh Home Area Network (HAN) Broadband WWAN 3G Cellular Cap banks Reclosers Switches Sensors Transformers Meters Storage Substation Wires Customers Smart grid: layered view 4 Business applications – “Smart Energy Web” Security 3 Computing / information technology 2 Communications infrastructure Energy information network 1 Energy infrastructure Source: P. Will, “IT and the Smart Grid,” USC Information Sciences Institute, 2010.
Smart grid technologies Key technologies • Communications • Sensing • Intelligence Same pillars of wireless sensor networks Wang et al., “A survey on the communication architectures in smart grid,” Computer Networks, vol. 55, pp. 3604-3629, 2011.
Smart Grid vs Wireless Sensor Network Similarities: • Large number of “nodes” both a boon (resilience) and a bane (every node is open to attacks) • Data-centricity means false data often adverse consequences Differences: Smart grid • “Control center” is a fleet of interconnected components only a few of which are assumed secure Wireless sensor networks • “Control center” is a single base station assumed to be secure
Is a control centre secure? • A complex computer system • Virus outbreak in Integral Energy’s IT networkhttp://bit.ly/16wskS • Microsoft’s shortcut bug exploited to attack grid control centreshttp://bbc.in/d9usyE Inter-control centre comm. Communication I/O controllers Open-access same-time information system A. P. SakisMeliopoulos, “Power System Modeling, Analysis and Control,” lecture notes, Georgia Institute of Technology
Smart grid standards and guidelines (1/2) • NERC CIP • Identify critical assets • Perimeter protection (firewalls, logging, remote access) • Host hardening, anti-virus, patching, etc. • IEEE 1686: Substation IEDs Cyber Security Capabilities • Passwords, alerts, audit logs • IEC 62351: Security of IEC communications protocols • Encryption, authentication, spoofing resistance, intrusion detection • DHS Cyber Security Procurement Language for Control Systems • DHS Catalog of Control System Security: • Recommendations for Standards Developers • ISA SP-99 Industrial Automation and Control Systems Security • ISA SP-100 Wireless Systems for Industrial Automation • American Gas Association (AGA) Report No. 12 • Cryptographic protection of SCADA communications
Smart grid standards and guidelines (2/2) • UCA International Users Group (ABB, Alstom, Cisco, etc.): • Security Profile for Wide-Area Monitoring, Protection, and Control • AMI System Security Requirements • NIST 800-82: Guide to Industrial Control System Security • NIST IR 7628: Guidelines for Smart Grid Cyber Security • “The differences between information technology (IT), industrial, and Smart Grid security need to be accentuated...” See also reports • United States Government Accountability Office: “Electricity Grid Modernization: Progress Being Made on Cybersecurity Guidelines, but Key Challenges Remain to be Addressed” • Idaho National Laboratory: “NSTB Assessments Summary Report: Common Industrial Control System Cyber Security Weaknesses” Resilient control system: A system that maintains state awareness and an accepted level of operational normalcy in response to disturbances, including threats of an unexpected or malicious nature. – Rieger et al., Idaho National Laboratory
Resilience (1/2) • EU project CRUTIAL: Security and resilience of SCADA systems • Enforces policies in a distributed manner • Access control • Intrusion tolerance • Self-healing
Resilience (2/2) • EU project VIKING: To enhance data integrity, reliability and resilience of SCADA systems, through the development and application of cyber-physical models (hybrid system models) for the interaction between the (cyber-) IT systems and the (physical) power transmission and distribution systems • Australia Government established Trusted Information Sharing Network (TISN) for Critical Infrastructure Resilience to let business and government share vital information on security issues relevant to the protection of national critical infrastructure
Risk assessment • Simply speaking, risk = the probability and magnitude of an undesirable event • Risk assessment/analysis=process of identifying the risks to system security and determining the likelihood of occurrence, the resulting impact, and the additional safeguards that mitigate this impact • Ultimate objective is to reduce total risk for a given expected return/utility • Most standards and guidelines stress the importance of risk assessment • Australian Government advocates the use of AS/NZS ISO 31000:2009 by owners and operators of critical infrastructure
Risk assessment methodologies (1/2) • Leitch says ISO 31000:2009 • is unclear • leads to illogical conclusions if followed • is Impossible to comply with • is not mathematically based, having little to say about probability, data, and models • Risk map References: • D. W. Hubbard, “The Failure of Risk Management: Why It’s Broken and How to Fix It,” Wiley, 2009. • M. Leitch, “ISO 31000:2009—The New International Standard on Risk Management,” Risk Analysis, 30(6):887–892, 2010.
Risk assessment methodologies (2/2) • Multi-attribute utility theory • Risk-versus-return curve (an example of utility curve) • Analytic Hierarchy Process (extension: Analytic Network Process) • Does not satisfy some statistical axioms including transitivity • Problem with its mathematical foundation References: • C. A. Bana e Costa and J.-C. Vansnick, “A critical analysis of the eigenvalue method used to derive priorities in AHP,” European Journal of Operational Research, vol. 187, pp. 1422–1428, 2008.
Power system is complex Logical Reference Model (NIST IR 7628): 47 actors, 137 inter-actor interfaces
Focus: energy management system • Energy management system • The “central nervous system” of a transmission grid • Asuite of software tools for monitoring, controlling as well as optimizing generation and transmission operations Transmission network Distribution network
Energy Management System (EMS) + Wide-area Measurement System Source: A. P. SakisMeliopoulos, “Power System Modeling, Analysis and Control,” lecture notes, Georgia Institute of Technology
Cyber attacks through WAMS • Attacker model/assumptions: • Core components (state estimator, automatic generation control) cannot be compromised but their I/O can • All other components can be compromised • False data injection can lead to wrong estimated states, cascading failures, or widespread blackouts • General multilayered defence: • Cryptography (auth + optionally enc) against outsider attackers • Redundancy + heterogeneity + intrusion detection
Wide-area measurement system (WAMS) Examples of phasor measurement units: GPS Macrodyne’s 1690 • Oscillation control • Voltage control • Frequency control • Line temperature monitoring • NIST IR 7628: • Authentication • Availability MiCOM P847 ABB’s RES521
Cautionary note regarding GPS • Synchrophasors rely on GPS • GPS is vulnerable to jamming (weak signal) and spoofing (see Nighswander et al. 2012) • T. Nighswander et al., “GPS Software Attacks,” CCS’12. • Short-term solution: EnhancedLong Range Navigation(eLORAN) • Long-term solution: atomicclocks A portable GPS and mobile jammer A LORAN transmitter
Multicast authentication and why it is a problem • PMU -> PDCs • PDC -> PDCs • Further example: System Integrity Protection Scheme (SIPS) • IEC 61850-90-5 governs the IEC 61850-compliant transmission of IEEE C37.118-formatted WAMS data • Specifies GDOI (RFC 6407) for securing the distribution of group keys • Specifies Ipsec (RFC 4301) for securing IP multicast using group keys
Constructing multicast authentication • (1) Based on conventional digital signatures • signature amortization • (2) Multiple-time signature schemes (incl. one-time) • +packet individually verifiable • +resilient to packet loss • +small code • +lower computational cost (?) • +lower memory cost (?) • -long signatures References:J. Pieprzyk, H. Wang, and C. Xing, “Multiple-time signature schemes against adaptive chosen message attacks,” in Selected Areas in Cryptography, ser. LNCS. Springer Berlin / Heidelberg, 2004, vol. 3006, pp. 88–100. • Well known MTS schemes • Lamport • Perrig: BiBa, TESLA, μTESLA • Reyzin & Reyzin: HORS
Comparison of BiBa, TV-HORS, SCU+, TSV+ • Comparison metrics: • SigLen/SecLvl • SigComp/SecLvl • VerComp/SecLvl • Comparison constraints: • security bits • signature length bytes • hash length = 80 bits • number of private key elements = 1024 • BiBa1: SigComp10VerComp • TSV+1: SigCompVerComp and VerCompSigComp Note: Not all schemes support more than 2 signatures per epoch, under the comparison constraint SigLen/SecLvl vs. number of signatures per epoch . Lower is better.
Comparison of BiBa, TV-HORS, SCU+, TSV+ • BiBa0: best performer in signature length but has far poorer efficiency in signing than the others. • BiBa1: slightly longer signatures but has significantly better signing efficiency than BiBa0. • SCU+: efficient in signing and verification but requires far longer signatures than the others for the same security level. • TSV+ is more efficient than TV-HORS in signature length for r = 1. • TSV+ is several orders of magnitude slower than TV-HORS in signing and verification. • Despite its algorithmic simplicity, TV-HORS is a good performer in all categories. • SCU and TSV do not offer clear advantages over BiBa. SigComp/SecLvl vs. number of signatures per epoch . Lower is better. VerComp/SecLvl vs. number of signatures per epoch . Lower is better Yee Wei Lawet al., ”WAKE: Key Management Scheme for Wide-Area Measurement Systems in Smart Grid,” IEEE Communications Magazine, accepted 10 Oct 2012, to appear.
Energy Management System (EMS) + Wide-area Measurement System Source: A. P. SakisMeliopoulos, “Power System Modeling, Analysis and Control,” lecture notes, Georgia Institute of Technology
State estimation Possible insider attack: inject bad data to foil detection Measurements State estimator Network topology processor Bad data detection Y. Liu et al., “False data injection attacks against state estimation in electric power grids,” Proc. 16th ACM Computer and Communications Security, 2009.
Liu et al.’s results • Attack scenario: given k compromised meters (RTUs/IEDs/ PMUs), find a vector of k false values that bypass detection Larger networks IEEE test systems
Application of Liu et al.’s attacks The attacker earns $2/MWh here The attacker loses $1/MWh here The attacker earns $1/MWh net Actually congested, faked not congested Actually congested, faked not congested L. Xie, Y. Mo, and B. Sinopoli, “False data injection attacks in electricity markets,” in Proc. 1st International Conference on Smart Grid Communications, 2010. IEEE 14-bus test system
Against outsider attacks: solved A multilayered architecture with a perimeter network Firewall + VPN • It is impractical to tamper-proof a whole PMU, for maintenance reasons, etc. • Even if tamper-proofing all PMUs is achievable, impractical for all RTUs and IEDs • Using redundant PMUs could reduce the risk, but also costly • Most (academic) research so far designed attacks under different constraints • We are investigating anomaly detection methods to detect false data Stewart et al., “Synchrophasor Security Practices,” white paper Against insider attacks: unsolved
Research opportunities • Attacker exploits assumptions about ‘bad data’ • χ2test assumes bad data cause errors to not be Gaussian • Largest normalized residual testassumes bad data cause measurement residuals to not be Gaussian distributed • Among latest solutions • Bobba et al.’s solution determines and makes critical PMUs tamper-resistant • Vukovićet al.’s solution assumes a core subset of substations are beyond attacks, and espouses multipath routing • State estimation: secure centralized estimation problem • Multi-area state estimation: secure distributed estimation problem • linear time-invariant average-consensus (linear consensus) Selected references: R. B. Bobba, K. M. Rogers, Q. Wang, H. Khurana, K. Nahrstedt, and T. J. Overbye, “Detecting False Data Injection Attacks on DC State Estimation,” in First Workshop on Secure Control Systems, ser. SCS, 2010. O. Vukovic, K. C. Sou, G. Dan, and H. Sandberg, “Network-layer protection schemes against stealth attacks on state estimators in power systems,” 2011 IEEE International Conference on Smart Grid Communications (SmartGridComm), pp.184-189, 2011. S. Zheng, et al., “Robust State Estimation Under False Data Injection in Distributed Sensor Networks,” in IEEE GLOBECOM 2010.
Energy Management System (EMS) Rotor angle + Wide-area Measurement System Voltage Frequency Source: A. P. SakisMeliopoulos, “Power System Modeling, Analysis and Control,” lecture notes, Georgia Institute of Technology
Automatic generation control (AGC) • Most standards tolerate a deviation from nominal of only 1% • Three levels of control: • Simple electro-mechanical proportional feedback control leaves a steady-state error residue • Load-frequency control • Manual control control area 2 Load System frequency Load System frequency control area 1 control area 3 tie line G. Anderson, “Dynamics and control of electric power systems,” lecture notes 227-0528-00, ETH Zürich, February 2010. AGC regulates system frequency and maintains power interchanges via the tie line at scheduled values
Attacking AGC Effects of a successful attack: • AGC is one of few automatic closed loops between the IT department and the power system Oscillatory growth in phase angle diff. References: P. M. Esfahani et al., “A Robust Policy for Automatic Generation Control Cyber Attack in Two Area Power Network,” in IEEE Conference on Decision and Control, Atlanta, Dec. 2010. P. M. Esfahani et al., “Cyber Attack in a Two-Area Power System: Impact Identification using Reachability,” in American Control Conference, Baltimore, MD, USA, Jun. 2010. Y. W. Law et al., “Security games and risk minimization for automatic generation control in smart grid,” in J. Grossklags and J. Walrand, editors, Proc. 3rd Conference on Decision and Game Theory for Security (GameSec 2012), volume 7638 of LNCS, pp. 281–295. Springer Heidelberg, 2012. Oscillatory growth in interchanged power
Control area 1 Automatic generation control Tie line Control area 2 Underfrequency load shedding Area 1 1/R Centralized Distributed Area 2
Underfrequency load shedding • Frequency deviation exceeds threshold -> overfrequency/underfrequency protection relays • When frequency deviation rises above 1.5 Hz, overfrequency relays start tripping thermal plants • When frequency deviation drops below 0.35 Hz, underfrequency relays shed load: • Goal: To model and quantify the risks posed by an attacker whose intention is to inflict revenue loss on the electricity provider by injecting false data to the automatic generation controller in the hope of triggering load shedding S. K. Mullen, “Plug-In Hybrid Electric Vehicles as a Source of Distributed Frequency Regulation,” Ph.D. thesis, University of Minnesota, 2009.
Theory: security games • Motivation: model interaction between attacker and defender to derive optimal defense strategy (optimal resource alloc) • Terminology: Matrix game Multi-agent Single-state Zero-sum: matrix games Nonzero-sum: bi-matrix games Markov decision process Single-agent Multi-state Stochastic/Markov game Multi-agent Multi-state Security game Two-agent Noncooperative Zero-sum Risk model Dynamic programming M. Bowling and M. Veloso, “Multiagent learning using a variable learning rate,” Artificial Intelligence, vol. 136, pp. 215-250, 2002.
Security game model Attacker action space: Defender action space: Transition probability determined by state transition matrix M: Attacker strategy: Attacker action Defender action A cost (from the defender’s perspective) of Ga,d(s(t)) is incurred by actions a and d, thus constituting the game matrix:
Infinite-horizon cost using value iteration • In the future-discounted cost model, aggregate cost over an infinite horizon:Discount factor is for convergence, and emphasizing immediate cost • For infinite horizon problems like this, the value iteration algorithm from dynamic programing is most widely usedt is incremented until |Qt+1-Qt|<ε Bellman equations saddle-point strategy
Linear programming method Solve for saddle-point strategy: Minimum upper bound for cost
Risk model • Risk is the probability and magnitude of an undesirable event • Simplistic definition:Risk the probability of attack × impact of attack • Risk measures from finance being explored:value-at-risk (VaR), conditional value-at-risk (CVaR) • VaR at confidence level the smallest number such that (is the -quantile of the loss distribution) • CVaR (expected shortfall): Risk state
Assumptions • Generators • Transmission lines • Turbine governors • Energy management system • Underfrequencyload shedding relays
Basic attacks Attacker AGC (integral controller) • Constant injection (): disables the integral control loop, system frequency converges to non-nominal frequency • Positive constant: below-nominal frequency, loads shed • Negative constant: above-nominal frequency, generators tripped • Bias injection (): similar to constant injection • Overcompensation (: unstable oscillations • When frequency sweeps past the overfrequency/underfrequency thresholds, generators are tripped, loads are shed false frequency = Simulation results for :
Basic attacks (cont’d) Attacker AGC (integral controller) • Negative compensation ():reverses the intended effect of the integral control loop, causing the frequency to diverge from the nominal frequency false frequency =