Stephen Neville Electrical & Computer Engineering Dept. University of Victoria

Are Deeper Levels of Risk Analysis a Requirement for Enabling Optimal Tactical Responses in INFOSEC Alert Correlation Systems ? Stephen Neville Electrical & Computer Engineering Dept. University of Victoria

Outline • Introduction • The Tactical Defense Problem • Standard Alert Correlation Solution • Attack Model • Defender Model • Optimal Responses • Idealized Correlation Model • Issue of one-to-one mappings • Requirements for risk analysis • Relevance to Operational Networks • Conclusions

Introduction • Cyber-security has become one of the major issues facing corporations and governments. • It has grown from being considered as solely an IT issue into one which senior management must address. • High levels of intrinsic risk have accompanied corporations’ increased reliance on their IT infrastructures for key business services and processes. • The nature of this risk though is poorly understood. • A fundamental need exists to place cyber-security within standard corporate frameworks for risk management. • But, little research has been done on the formalizations required within real-world contexts.

This work looks at one aspect of the problem: • The generation of optimal tactical defensive responses. • Specifically, can current approaches meet this goal? • i.e. Alert Correlation/Security Management Systems • Why is this important? • Networks continue to grow across various dimensions: • Size • Speed • Complexity • Etc. • Growing use and sophistication of attack tools. • Currently, reliance on human centric defenses • Likely untenable in the near term. • Tactical response generation will be one of the first areas where the lag in human response times will show.

Problem Context: • A standard large scale corporate network • 1000’s of hosts • Various subnets • Multiple geographically diverse locations • Various access points (internet, wireless, etc.) • Best practice security in place (firewalls, IDS’s, virus checking, etc.) • Deployed COTs-based security (inclusive of open source). • Supports multiple business critical services and process. • Etc.

VPN Regional SubNet B 100’s of Hosts Corporate Network Wireless Access Firewalls Primary Network 1000+ Hosts Internet DMZ VPN • Web Servers • Proxy Servers • Etc. • Business Partners • Suppliers • Customers • Etc. VPN Regional SubNet A 100’s of Hosts

Current security technologies are primarily point–source solutions: • Firewalls • Intrusion Detection Systems (IDS) • Intrusion Prevention Systems (IPS) • VPN’s • Virus Checking • File Check Pointing • Etc. • Each is type of sensor only observes its sub-set of the attack space. Systems-level integration is required maximize coverage of the attack space and to provide tactical situational awareness

The Tactical Defense Problem • Goal: • Detect attack incidents as early as possible • Enact appropriate & timely defenses • Detection is not enough. • Attacks must be mitigated before losses are incurred. • Mitigate risks & minimize losses • Minimally impact authorized (normal) events/services • Defender Approach: • Deploy diverse suite of point-source security sensors • Monitor their generated INFOSEC alert streams • Combine these streams into tactical assessments and generated estimates of the enacted attacks • From these attack estimates choose and enact the best response • Domain of Alert Correlation Systems • More recently marketed as “Security Management” Systems

Alert Correlation System Tactical Assessment VPN Regional SubNet B Tactical Defense Problem Wireless Access Firewalls Primary Network VPN DMZ VPN Regional SubNet A Defensive Responses

ak aj am ap INFOSEC Sensors Defended Network Alert1(ak) ak s1 Alert Correlation System Alert2(ak) s2 ak ak ak ak Alert3(ak) s3 Collected Alerts Alertn(ak) Anlaysis sn Tactical Response Estimated Attack(s)

Ostensibly, A real-time pattern recognition problem But, • Malicious and intelligent opponents. • Asynchronous alert streams • Ordering of alert arrivals at correlation system is a random process. • Possibly with: • Multiple simultaneous attackers • Coordinated attacks • Largely uncharacterized sensors sets • Exactly what does each sensor trigger on? • How does this maps to the information reported in the sensor’s alerts? • Severely complicates the data fusion problem. • High false alarm rates • >90% in standard operation • Little in the way of statistical data

Current Solution:Alert Correlation System • [Vigna and Kremmer, 2004]

Input: INFOSEC Alert streams • Output: Prioritized Intrusion reports • Hierarchical process • No feedback paths. • Subsequent stages are based on the alert fusion (or cluster) stage’s results.

A Fundamental Question: • Are these clusters correct ? • How can one prove correctness in operational environments? • Correlation literature has primarily focused on the data reduction task • Correctness, specifically for the case of maliciousness, has been unaddressed. • Requires suitable attacker and defender models.

Attacker Model • Attackers’ Goal: • Enact a successful attacks with minimal effort. • Note success is the primary goal • This is different than avoiding detection: • If one can enact a detected attack that completes prior to a response being enacted then one succeeds, given the difficulties with trace back. • If one enacts an attack that is detected but mis-classified then one may also succeed. • Assumptions: • Attackers may be internal or external. • Multiple attackers may exist. • Attacks & attackers may be coordinated. • Attackers are intelligent and rational, as per game theory’s definitions.

Digression: Game Theory • Game Theory: • View security as a game between the defender and an unknown number of attackers. • Each chooses moves in response to their estimates of the environment and the others actions. • A game in which each side only has partial (i.e., incomplete) information • Each player has the potential to learn more about the environment and their opponent as the game progresses.

Rationality: • Attackers make moves that maximize their utility • i.e. one assumes they make the best moves they can based on their current information. • One cannot assume the attackers will choose to make “poor” moves. • Intelligence: • Strictly speaking, means the attackers and defenders have same knowledge regarding the game (i.e. equal players) • Within the security context this is more accurately viewed as the attacker having the potential to have the same level of knowledge as the most knowledgeable defender. • Directly implies “security-by-obscurity” is untenable • Defender needs to assume they are “playing” against the best attacker. • A note on cryptography: • Assume that if properly used then breaking the key is computationally infeasible • But, the attacker may be the corporate personal entrusted with the key(s) • So cryptography is the solution iff it itself is provably the “weak” link • Security is achieved since the “weak” link is computationally infeasible to circumvent.

Attacker Model (cont.) • More formally: • Set of atomic attacks a = {a1,a2,…,aN} exists. • Composite attacks composed of sequences of atomic attacks (i.e.,aJ= {ai,aj,…,ak} ) • J an index set on {1,…,N} (with non-unique entries allowable) • On each turn the attacker “plays” their next atomic attack which maximizes their perceived utility. • This choice must be based on their current degree of knowledge about the game • (i.e. all the information they possess regarding the state of the network, its defenses, and the defensive tactics being employed). • Also based on the subset of the attack space a that is known to the given attacker(s)

Attacker Notation:

Attacker Notation (cont.): • Note: • This model can be trivially extended by allowing Gk andCk*to be time dependent parameters. • Thereby reflecting the attacker’s information gains (or their perceived gains). • In this manner, opportunistic changes in the attacker’s objectives can be accounted for.

Ak’s Goal: • At each decision point tm choose the aj satisfying • Subject to the constraint

a ak Targeted Network If Gk not reached & Ck* not exceeded

Cost Time

Defender Model • Formally: • Set of deployed sensors S= {s1,s2,…,sN}. • An aj is detected if it triggers at least one alert, alertk(aj) from a deployed sensor, sk • S describes the defender’s observability • aj’s not covered S by go undetected. • Only part of a may be observable by S. • Each sensor produces an asynchronous stream of alerts, alerts(aJ), in response to an attack • Goal: correlate these alerts to generate a tactical assessment

Defender Notation:

Defender Notation (cont.):

Defender’s Goal: • At each decision point tm choose the {rj} that satisfies • Where g(.) is a defender chosen function that balances the expectation of loss against the estimated costs of enacting the chosen responses. • Obviously, the optimum will be achieved iff at each decision point

a ad Defended Network

Cost Loss

Returning to Correctness • Even with this model general correctness cannot be assessed. • Further simplify the problem, • Assume a defender idealized case: • No false alarms • All attacks trigger at least one alert • i.e., complete observability over the union of all attackers’ attack spaces • The correlation system and the sensor suites are themselves unassailable • Now the attackers’ only option is to manipulate their attacks to cause the defender to select a sub-optimal response • This is impossible if there is a one-to-one mapping between enacted attacks and generated clusters. • If there is a one-to-many mapping then the defender must choose which response to perform • Risk analysis allows for such a selection. • Assumption is that enacting all supportable responses comes at a higher cost.

When would can one-to-one mappings be guaranteed ? • Trivial case: (guaranteed) • All attacks trigger at least one uniquely identifiable alert • Problem reduces to focusing only on these unique alerts • No alert clustering is required. • One-to-one mapping guaranteed. • No possibility for sub-optimal response • Would required provably orthogonal alerts • Non-Trivial case: (not guaranteed) • Attacks are identifiable through sets of non-unique alerts • Denote these as the attack’s critical alerts • Can focus solely on what happens to these alerts • These alerts must be provably correctly clustered to for there to be a one-to-one mapping • But, cannot prove this even in the idealized case since the attacker can influence how critical alerts are clustered.

Alert Streams to Higher Analysis Layers Cluster Merging Stage Clustering Stage Standard Clustering Algorithm • Two stage algorithm: • Each arriving alertk is placed into all clusters within the clustering stage it is “close to”. • “Close” defined by an implemented similarity metric d(alertk,aj) • New cluster is started iff the given alertk does not match any of the existing clusters. • Once the age of a cluster has exceeded a pre-defined threshold, La, the cluster is passed onto the merging stage. • In general, this threshold would be attack class specific. • If this cluster is “close” to one of the existing merging stage clusters then the two clusters are merged. • Otherwise it becomes the newest merging stage cluster.

Attacker’s Influence • How can the attacker cause a mis-clustering of at least one of the critical alerts? • Assume the similarity metrics are ideal • Attacker can influence the cluster contents by exploiting the timing characteristics introduced by La . • Fundamentally, the first critical alert arriving from an attack must not correctly initiate its cluster. • Guaranteed to happens if there exist pre-existing clusters that can “absorb” this alert. • If the attacker initiates such clusters before hand. • The defender then must mis-assign at least one of the critical alerts • Therefore, at least one sub-optimal response will be made.

Relevance to Operational Networks • Such attacks could only be exploited by knowledgeable attackers. • Outside the realm of “script-kiddies” • Principal advantage: • Provides an attack methodology that would not trigger the “arms race”. • Worthwhile for highly skilled attackers. • Intrinsic issue within the design of correlation systems • Cannot be solved by adding more sensors • Requires that the potential for one-to-many mappings be addressed. • Adding deeper levels of risk analysis would at least allow that defender to minimize their expectation of loss conditioned on their current information. • Hard real-time tactical defense constraint requires a response to be made before T(aj) expires.

Conclusions • Correctness is as important as data reduction. • Maliciousness: • Makes correctness hard to asses. • Engenders the need to prove one-to-one mappings exist in the real-world • Or, to develop techniques to address one-to-many mapping. • Deeper levels of risk analysis being on such technique. • Defender has no information to allow a selection from the plausible attacks based on the observed evidence. • Hard real-time constraints mean the defender cannot wait until this information comes in. • Assuming losses accumulate as attacks progress. • Minimizing the expectation of loss directly implies a need to perform risk analysis if one-to-one mapping cannot be proven.

Conclusions (cont.) Otherwise, • A knowledgeable attacker can gain a significant advantage • Specifically, an attack methodology that will not trigger the “arms race”. • The defender thinks they stopped the attack, so why change the defenses • Only the attacker knows the true attack. • Such a class of attacks is outside the observability of current methods. • A “holy grail” for the attackers • An attack that is simultaneously: • Detected by the sensors • But is outside the defender’s observability (if one-to-one mappings are assumed). • Potentially costly to find, but likely worth the effort for higher end targets. • May be easier than trying to discover new attacks against hardened targets.

Questions ?

Stephen Neville Electrical & Computer Engineering Dept. University of Victoria