Application of Association Rules in Intrusion Detection

Application of Association Rules in Intrusion Detection Xiangyang Li Dept. Industrial Engineering ASU

Association rules (1) • Objective learning rules, associations; • Transaction data; e.g. Record Items 1 soda, milk 2 detergent, soda, cleanser 3 cleanser, soda … … • Correlation among items; • 2-way, 3-way, … k-way rules, e.g., i, j  k; • A famous association rule: diapers -> beers Application of Association Rules in Intrusion Detection

Data • Shell command time hostname command arg1 arg2 am pascal mkdir dir1 am pascal cd dir1 am pascal vi tex • Network connection records time duration service src_bytes dst_bytes flag 1.1 10 telnet 100 2000 SF 2.0 2 ftp 200 300 SF 2.3 1 smtp 250 300 SF 3.4 60 telnet 200 12100 SF Application of Association Rules in Intrusion Detection

Association rules (2) • Why? Program executions and user activities exhibit frequent correlations among system features. • Definition Let A be a set of attributes, and I be a set of values on A, called items. Any subset of I is called an itemset. The number of items in an itemset is called its length. Let D be a database. An association rule is the expression X  Y, confidence, support . Here X and Y are itemsets, and XY=. support is the percentage of transactions (records) in D that contain XY,and confidence is the percentage of transactions that contain X and also contain Y, i.e., c=support(XY)/support(X). Application of Association Rules in Intrusion Detection

Association rules (3) • For example, an association rule from the shell command history file (which is a stream of commands and their arguments) of a user is trnrec.humor, 0.3, 0.1, which indicates that 30% of the time when the user invokes trn, he or she is reading the news in rec.humor, and reading this newsgroup accounts for 10% of the activities recorded in his or her command history file. Application of Association Rules in Intrusion Detection

Association rules (4) • Control on the number of produced rules For example, if there are 100 different items, the number of possible 2- way rules: 100*99 10K  1002. Minimum_support and minimum_confidence requirements are used to find rules on frequent itemsets. • Any subset of a frequent itemset must be also a frequent itemset. The algorithm starts with finding the frequent itemsets of length 1, then iteratively computes frequent itemsets of length k+1 from those of length k. • Rule generation If confidence=support(X)/support(subset of X)>=minimum_confidence Then output rule subset of X X-subset of X with support=support(X). Application of Association Rules in Intrusion Detection

Frequent episodes (1) • Why? There is need to study the frequent sequential patterns of events. • The problem of finding frequent episodes is based on minimal occurrences. Briefly, given an event database where each transaction is associated with a timestamp, an interval [t1, t2] is the sequence of transactions that starts from timestamp t1 and ends at t2. The width of the interval is defined as t2 - t1. Given an itemset A in D, an interval is a minimal occurrence of A if it contains A and none of its proper sub-intervals contains A. A frequent episode rule is the expression X,YZ,confidence,support,window. Here X, Y and Z are itemsets. support is the percentage of minimal occurrences of XYZ (that is, the ratio between the number of occurrences and the number of records in D). confidence is the percentage of minimal occurrences that contain XY and also contain Z. Application of Association Rules in Intrusion Detection

Frequent episodes (2) • Here the width of each of the occurrences must be less than window. • A serial episode rule has the additional constraint that X, Y and Z must occur in transactions in partial time order, i.e., Z follows Y and Y follows X. • The implementation of frequent episodes algorithm utilizes the data structures and library functions of the association rules algorithm. Here instead of finding correlations across attributes, we look for correlations across records. A temporal join function that considers minimal occurrence is used to create the interval vector of a candidate length k itemset from the two interval vectors of two length k-1 frequent itemsets . Application of Association Rules in Intrusion Detection

System knowledge (1) • The basic algorithms do not consider any domain knowledge and as a result they can generate many ``irrelevant'' rules. An association rule: The basic association rules algorithm may generate rules such as src_bytes=200flag=SF. • Axis variables Intuitively the axis attribute(s) is the essential attribute(s) of a record (transaction). We consider the correlations among non-axis attributes as not interesting. During candidate generation, an itemset must contain value(s) of the axis attribute(s). Since the most important information of a connection is its service, we use it as the axis attribute. The resulting association rules then describe only the patterns related to the services of the connections Application of Association Rules in Intrusion Detection

System knowledge (1) - continued • It is even more important to use the axis attribute(s) to constrain the item generation for frequent episodes. The basic algorithm can generate serial episode rules that contain ``non-essential'' attribute values. For example src_bytes=200, src_bytes=200 dst_bytes=300, src_bytes=200. Compared with the association rules, the total number of serial rules is large and so is the number of such useless rules. • First find the frequent associations using the axis attribute(s) and then generate the frequent serial patterns from these associations. An example of a rule is (service=smtp, src_bytes=200,dst_bytes=300 ,flag=SF), (service=telnet, flag=SF) (service=http,src_bytes=200 ). Here we in effect have combined the associations (among attributes) and the sequential patterns (among the records) into a single rule. This rule formalism provides rich and useful information. Application of Association Rules in Intrusion Detection

System knowledge (2) • Reference variables Some essential attributes can be the references of other attributes. They are like some “subject”, and other attributes describe the “actions” that refer to the same “subject”. When reference attribute is used, the frequent episodes algorithm ensure that, within each episode’s minimal occurrences, the records covered by its constituent item sets have the same reference attribute value. • Example, a “syn flood”attack, where the attacker sends a lot of “half-opened” connections (i.e., flag is “S0”) to a port (e.g., “http”) of the same victim dst_host (reference attribute). (service=http, flag=S0), (service=http,flag=S0)  (service=http,flag=S0),0.93,0.03,2 Application of Association Rules in Intrusion Detection

System knowledge (3) • Low frequency patterns Sometimes it is important to discover the low frequency patterns. In daily network traffic, some services, for example, gopher, account for very low percentages. Yet their patterns still need to be included into the network traffic profile (so that there are representative patterns for each supported service). But If a very low support value is used, then unnecessarily a very large number of patterns related to the high frequency services, for example, smtp, are produced. Application of Association Rules in Intrusion Detection

System knowledge (3) - continued • Level-wise approximate mining procedure The idea is to first find the episodes related to high frequency axis attribute values. Then iteratively lower the support threshold to find the episodes related to the low frequency axis values by restricting the participation of the ``old'' axis values that already have output episodes. More specifically, when an episode is generated, it must contain at least one ``new'' (low frequency) axis value. The procedure terminates when a very low support value is reached. In practice, this can be the lowest frequency of all axis values. 1) (service=smtp, src_bytes=200), (service=smtp,src_bytes=200 ) (service=smtp, dst_bytes=300) 2) (service=smtp, src_bytes=200), (service=http, src_bytes=200) (service=smtp, src_bytes=300). Application of Association Rules in Intrusion Detection

System knowledge (3) - continued • Note that for a high frequency axis value, this method in effect omits its very low frequency episodes (generated in the runs with low support values) because they are not as interesting (representative). Hence this procedure is ``approximate'' mining. All the old (high frequency) axis values are still included to form episodes with the new axis values because it is important to capture the sequential context of the new axis values. For example, although used infrequently, auth normally co-occurs with other services such as smtp and login. It is therefore imperative to include these high frequency services into the episode rules about auth. Application of Association Rules in Intrusion Detection

Application (1) Anomaly detection • The patterns discovered from the audit data on a protected target (e.g., a network, system program, or user, etc.) corresponds to the target's behavior. When gathering audit data about the target, the patterns from each new audit data set are computed, and the new rules are merged into the existing aggregate rule set. The added new rules represent (new) variations of the normal behavior. When the aggregate rule set stabilizes, i.e., no new rules from the new audit data can be added, the data gathering can stop since the aggregate audit data set has covered sufficient variations of the normal behavior. Application of Association Rules in Intrusion Detection

Application (1) - continued • Merge process- new rule set to aggregate rule set 1) Their right and left sides are exactly the same, or can be combined; 2) The support and confidence values are close, i.e., within a user-defined threshold. Match_count can be used to control the final rule output. • This approach of merging rules is based on the fact that even the same type of behavior will have slight differences across audit data sets. Therefore we should not expect perfect (exact) match of the mined patterns. Instead similar patterns need to be combined into more generalized ones. • The discovered patterns from (the extensively gathered) audit data can be used directly for anomaly detection. Application of Association Rules in Intrusion Detection

Application (2) Signature recognition • “Intrusion only” patterns Given the normal patterns and patterns from an intrusion dataset, with the choices of axis attributes(s), reference attribute(s), support, confidence, and window requirements, intrusion only patterns can be identified. 1) For each pattern from the intrusion dataset, calculate a difference score with each normal pattern, and keep the lowest score as the “intrusion” score for this pattern 2) Output all patterns that have non-zero “intrusion”scores, or a user-specified percentage of patterns with the highest “intrusion” scores. Application of Association Rules in Intrusion Detection

Application (3) Feature selection • An important use of the mined patterns is as the basis for feature selection. When the axis attribute is used as the class label attribute, features (the attributes) in the association rules should be included in the classification models. • The time windowing information and the features in the frequent episodes suggest that their statistical measures, e.g., the average, the count, etc., should also be considered as additional features. • An example: a large number of “rejected” network connections in a very short time span is a strong evidence of some intrusions. Application of Association Rules in Intrusion Detection

Application (3) - continued • Each of the intrusion only patterns is used for constructing additional features. 1) When the same value of an attribute is repeated several times in a frequent episode rule, it suggests a corresponding count feature. 2) When an attribute (with different values) is repeated several times in the rule, add a corresponding average feature. 3) When the same value appears in all the itemsets of an episode, there is a large percentage of records that have the same value. • These statistical and temporal features are used in constructing classifiers using rule generation algorithm such as Ripper and are claimed to improve the classification greatly. Application of Association Rules in Intrusion Detection

Window value • Window size The experience shows that when plot the number of patterns generated using different window values w, it tends to stabilize after the initial jump. The smallest value in the stable region is called w0.The experiment showed that the plot of the accuracy of the classifiers that use the temporal and statistical features calculated with different w, also stabilize after w>=w0. Intuitively, a requirement for a good window size is that its set of sequential patterns is stable, that is, sufficient patterns are captured and noise is small. Application of Association Rules in Intrusion Detection

Encoding of association rules (1) • “Completed and ordered” associations The records have n attributes, an association (A1=v1, A2= v2,... Ak=vk) is “complete and ordered” if k=n and attributes A1,A2,…Ak are in user-defined decreasing “order of importance”. Then it can be recorded as a number ev1ev2…evn. evi is: 0 if vi is null, or the order of appearance of vi among all the values of Ai processed thus far in the encoding process. • An episode rule, X,YZ is mapped into a 3-d data point (encodingX, encoding Y, encodingZ). • For pattern comparison, this 3-d encoding is converted into a 1-d value, as x1z1y1x2z2y2…xnznyn. This presentation preserves the “order of importance” of attributes and considers the rule structure of an episode. Application of Association Rules in Intrusion Detection

Encoding of association rules (2) • Two episodes that have the similar first “body”(i.e., X) and “head” (i.e., Z)will be mapped to closer numbers. • Example: “syn flood”: (service=http, flag=S0), (service=http,flag=S0)  (service=http,flag=S0) ->222111 “normal”: (flag=SF, service=http), (flag=SF, service=icmp_echo)  (flag=SF, service=http) -> 111112 thus a difference score is 111001 defined as the absolute value difference in corresponding digits. Application of Association Rules in Intrusion Detection

Real-time detection (1) • Low cost “necessary” conditions Features of 3 “cost” level:level features can be computed from the first packet; level 2 features can be computed at the end of the connection, using only information of current connection; level 3 can be computed at the end of the connection, but require access to data of other prior connections. Ideally there are a few tests involving the low cost features to eliminate the majority of the rules that need to be checked, thus eliminating the needs to compute some high cost features. An example: port_scan  src_bytes = 0 src_bytes = 0 is a necessary association for port scan intrusion. If this condition is failed the features of the rules for this intrusion need not be computed, unless they are needed for other rules. Application of Association Rules in Intrusion Detection

Real-time detection (2) • Rule filtering 1) n Ripper rules: a n-nit “remaining” vector to indicate which rules still need to be checked; initially all bits are 1’s. 2) Each rule: an “invalidating”n-bit vector, where only the bit corresponding to the rule is 0 and all other bits are 1’s. 3) Each high cost feature: a “computing” n-bit vector, where only the bits corresponding to the rules that require this feature are 1’s. When examining a packet, or a connection, if a “necessary” condition of an intrusion is violated, the corresponding “invalidating” bit vectors of the Ripper rules of the intrusion are used to AND the “remaining” vector and all the “computing” vectors for the high cost features. Then only the features with non-zero “remaining” vectors are useful after all necessary conditions are checked. Application of Association Rules in Intrusion Detection

Experiements This data mining method has been applied on different data sets. In its application to 1998 DARPA Intrusion Detection Evaluation Program, three models are constructed based on different feature sets of data. They are: • Content model - for suspicious behavior in the data portion based on domain knowledge, not from association rule analysis. • Traffic model - time-based “traffic” features of the connection recordsin the past 2 seconds, including “same host” and “same service” features. • Host traffic model - a mirror set of features as the “traffic” features using a “connection” window of 100 connections, for “slow” probing attacks. Three base models and the meta-level classifier are built using Ripper, a rule induction algorithm. It claims very good performance. Application of Association Rules in Intrusion Detection

Summary Contributions • Provide automatic feature construction method based on frequent patterns; • A simple and useful pattern encoding and comparison technique to assist model construction; • Strategy for minimizing the cost of model execution. Disadvantages • Only handle nominal variables in data • A lot of domain knowledge used to improve performance Application of Association Rules in Intrusion Detection

Questions?

Application of Association Rules in Intrusion Detection

Application of Association Rules in Intrusion Detection

Presentation Transcript

Intrusion Detection

Intrusion Detection

Intrusion Detection

Application Intrusion Detection

Intrusion Detection

Intrusion Detection

Intrusion Detection

Intrusion Detection

Intrusion Detection

Intrusion Detection

Application Intrusion Detection

Intrusion Detection

Intrusion Detection

Intrusion Detection

Intrusion Detection

Application Intrusion Detection

Intrusion Detection

Intrusion Detection

Intrusion Detection